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GRAND  BALLROOM  A/B 


8:20am 

Opening  Remarks 

H.  Scott  Hinton,  University  of  Colorado-Boulder,  General  Chair 

GRAND  BALLROOM  A/B 


8:30am-1  0:00am 

OMA  •  Optical  Computing  Systems 

H.  Scott  Hinton,  University  of  Coiorado-Bouider,  Presider 

8:30am  (Invited) 

OMA  1  •  Optoelectronic  technology  for  real  world  computing, 

Hiroyoshi  Yajima,  Electrotechnical  Laboratory  Japan.  Optoelectron¬ 
ics  technologies  developed  in  the  real  world  computing  program, 
and  the  joint  optoelectroic  project,  which  improve  the  availability 
of  novel  prototype  devices,  are  described,  (p.  2) 

9:00am 

OMA2  •  Implementation  of  a  16-channel  sorting  module,  Dou¬ 
glas  A.  Baillie,  Frank  A.  P.  Tooley,  Simon  M.  Prince,  Nicola  L.  Grant, 
Julian  A.  B.  Dines,  Marc  P.  Y.  Desmulliez,  Mohammad  R.  Taghizadeh, 
Heriot-Watt  Univ.,  U.K.  This  paper  will  present  experimental  de¬ 
tails  of  a  sorting  module  demonstration  system.  The  system  imple¬ 
ments  the  bitonic  sort  based  on  Batcher's  algorithm  implemented 
with  a  perfect  shuffle.  A  re-circulating  rather  than  pipelined  arrange¬ 
ment  is  used  to  minimize  hardware  requirements  to  two  smart-pixel 
chips,  (p.  5) 

9:15am 

OMA3  •  Massive  optical  interconnections  (MOI):  interconnections 
for  massively  parallel  processing  systems,  S.  Araki,  M.  Kajita,  K. 
Kasahara,  K.  Kubota,  K.  Kurihara,  T.  Suzaki,  NEC  Corp.;\.  Redmond, 
E.  Schenfeld,  NEC  Research  Institute.  The  architecture,  design,  and 
performance  of  a  64  port,  free-space  optical  interconnection  net¬ 
work  using  an  interconnection-cached  routing  for  massively  paral¬ 
lel  processing  is  described,  (p.  8) 

9:30am  (Invited) 

OMA4  •  Intelligent  optical  backplanes,  Ted  Szymanski,  McGill 
Univ.,  Canada.  Intelligent  optical  backplanes  can  enhance  com¬ 
puting  and  communications  architectures  by  simultaneously  trans¬ 
porting  and  processing  digital  data  at  aggregate  terabit  rates. 
Prospects  for  intelligent  optical  backplanes  and  smart-pixel  arrays 
will  be  described.  (p.  11) 

GRAND  BALLROOM  C 


1  0:00am-1  0:30am 
Coffee  Break/Exhibits 


GRAND  BALLROOM  A/B 
10:30am-12:00m 

OMB  •  Digital  Optical  Computing 

Miles  Murdocca,  Rutgers  University,  Presider 

1  0:30am  (Invited) 

OMB1  •  Massively  parallel  processing  (MPP)  with  optical  inter¬ 
connections:  what  can  be,  should  be,  and  must  not  be  done  by 
optics,  Eugen  Schenfeld,  NEC  Research  Institute.  Optics  has  made 
many  promises  of  becoming  the  communication  technology  of 
choice  for  MPP.  However,  most  of  the  previous  directions  have  failed. 
We  will  suggest  a  new  approach  for  optics  to  have  a  real  merit  in 
MPP  applications,  (p.  16) 

1 1 :00am 

OMB2  •  Two-layer  image  processing  system  incorporating  inte¬ 
grated  focal  plane  detectors  and  through-wafer  optical  intercon¬ 
nect,  D.  Scott  Wills,  Nan  Marie  Jokerst,  Martin  Brooke,  April  Brown, 
Georgia  Institute  of  Technology.  This  paper  outlines  an  extremely 
dense  image  processing  system  which  combines  integrated  GaAs 
and  InGaAsP  thin-film  optoelectronic  devices  with  Si-based  VLSI 
digital  processing  processors,  (p.  19) 

11:15am 

OMB3  •  Multiprocessor  architectures  using  POPS  interconnec¬ 
tion  networks,  ]ames  P.  Teza,  Donald  M.  Chiarulli,  Steven  P.  Levitan, 
Rami  G.  Mel  hem,  G.  G raven streter,  Univ.  Pittsburgh.  We  present 
the  design  and  simulation  of  a  highly  scalable  optoelectronic  multi¬ 
processor  based  on  a  partitioned  optical  passive  star  (POPS)  topol¬ 
ogy  and  state-sequence  control,  (p.  23) 

11 :30am 

OMB4  •  Decomposition  method  for  matrix-addressable  microlaser 
arrays,  Hans  Raj  Nahata,  Miles  Murdocca,  Rutgers  Univ.  An  algo¬ 
rithm  for  decomposing  arbitrary  patterns  into  a  minimal  set  of 
subpatterns  that  are  applied  in  succession  to  a  matrix-addressable 
microlaser  array  is  presented,  (p.  26) 

11:45am 

OMB5  •  Routing  algorithm  for  a  circuit-switched  optical  extended 
generalized  shuffle  network,  Clare  Waterson,  B.  Keith  Jenkins,  Univ. 
Southern  California.  A  parallel  routing  algorithm  for  circuit-switched 
combining  extended  generalized  shuffle  (EGS)  networks  is  presented. 
Simulations  show  that  algorithm  time  complexity  is  logarithmic  in 
network  size.  (p.  29) 

1  2:00m“1 :30pm 
Lunch  on  Own 

GRAND  BALLROOM  C 


1 :30pm-3:00pm 

OMC  •  Poster  Session:  1 

OMC1  •  VACT:  Optical  parallel  implementation  of  fuzzy  logic  and 
visualization  of  its  results  with  digital  halftoning,  Tsuyoshi  Konishi, 
Jun  Tanida,  Yoshiki  Ichioka,  Osaka  Univ.,  Japan.  We  propose  a  novel 
method  called  the  visual  area  coding  technique  (VACT)  for  optical 
Implementation  of  fuzzy  logic  with  capability  of  visualizing  the  pro¬ 
cessed  results,  (p.  34) 


V 
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MARCH  13,  1995 


OMC2  •  Robust  light  bullet  dragging  logic,  Robert  McLeod,  Kelvin 
Wagner,  Steve  Blair,  Univ.  Colorado-Boulder.  Vector  electromag¬ 
netic  simulation  of  optical  logic  based  on  colliding  3D  solitons  in 
non-Kerr  media  demonstrates  tolerance  to  angular,  positional,  and 
timing  alignment  and  energy  variations,  (p.  37) 

OMC3  •  Digital  optical  pipeline  cellular  automata  arithmetic  unit, 
Alastair  D.  McAulay,  Lehigh  Univ.  The  multiplication  of  images  in 
1 60  fs  was  recently  demonstrated  by  means  of  four-wave  mixing  in 
a  new  polymer  material.  We  present  a  conceptual  method  of  using 
such  a  material  in  a  loop  to  perform  pipeline  digital  arithmetic  op¬ 
erations  such  as  addition  and  multiplication,  (p.  40) 

OMC4  •  Design  of  an  optoelectronic  graphics  display  processor, 
Vincent  P.  Heuring,  Melanie  D.  Berg,  Univ.  Colorado-Boulder.  This 
paper  describes  the  design  of  an  optoelectronic  graphics  display 
processor.  The  processor  has  the  advantages  of  simplicity  and  ex¬ 
tremely  high-speed  generation  of  computer  graphic  images,  (p.  43) 

OMC5  •  A  constant-time  parallel  sorting  algorithm  and  its  optical 
implementation  using  smart  pixels,  Ahmed  Louri,  Jongwhoa  Na, 
Univ  Arizona; lames  Hatch,  Jr.,  Trimble  Navigation.  A  parallel  sort¬ 
ing  algorithm  and  its  efficient  optical  implementation  are  presented. 
The  algorithm  sorts  n  data  elements  in  constant-time,  independent 
of  the  number  of  elements.  (p.  46) 

OMC6  •  Analysis  of  a  3D  computer  optical  scheme  with  bi-direc¬ 
tional  interconnects,  V.  Morozov,  J.  Neff,  A.  Fedor,  H.  j.  Zhou,  Univ. 
Colorado.  A  3D  computer  model  based  on  the  Fresnel  approxima¬ 
tion  was  developed.  Noise  and  cross  talk  as  a  function  of  wave¬ 
length  variation,  scattering,  aberrations,  and  misalignment  of  the 
components  were  estimated,  (p.  49) 

OMC7  •  Impact  of  gate  fanin  and  fanout  limits  on  optoelectronic 
circuit  speed,  Lianhua  Ji,  Vincent  P.  Heuring,  Univ  Colorado— Boul¬ 
der.  The  inherent  high  fanin  and  fanout  abilities  of  optoelectronics 
can  be  systematically  exploited  to  reduce  circuit  delay.  These  opto¬ 
electronic  circuits  outperform  their  electronic  equivalents,  (p.  52) 

OMC8  •  Processing  unit  for  stacked  optical  computing  system: 
discrete  digital  correlator,  Hideo  Kawai,  Yoshinori  Takeuchi,  Op¬ 
toelectronics  Matsushita  Laboratory,  Japan.  We  implemented  the 
discrete  digital  correlation  using  a  processing  unit  consisting  of  a 
spatial  light  modulator  and  fiber  plate  devices,  (p.  55) 

OMC9  •  Software  package  for  design  offree-space  optical  inter¬ 
connects,  Christopher  L.  Coleman,  Arthur  F.  Gmitro,  Univ  Arizona; 
Paul  E.  Keller,  Pacific  Northwest  Laboratory;  Paul  D.  Maker,  Jet  Pro¬ 
pulsion  Laboratory.  This  paper  describes  a  software  package  devel¬ 
oped  for  the  computer-aided  design  of  multi-level  phase  Fourier 
transform  holograms.  Features,  limitations,  and  manufacturing  of 
the  designs  are  discussed,  (p.  58) 

OMC10  •  Detection  ofx-y  misalignment  error  using  optical  cross 
talk  in  a  lenslet-array-based  free-space  optical  link,  G.  C.  Boisset, 
B.  Robertson,  W.  Hsiao,  D.  V.  Plant,  McGill  Univ,  Canada;  H.  S. 
Hinton,  Univ  Colorado-Boulder.  The  technique  for  using  optical 
cross  talk  to  detect  the  lateral  misalignment  error  of  an  arrayofbeams 
in  a  lenslet-based  optical  interconnect  is  described,  (p.  62) 


OMC11  •  Comparison  of  GRIN  rod  lenses  and  planar  ion-exchange 
microlenses  for  the  Interconnection  of  optoelectronic  device  ar¬ 
rays,  N.  McArdle,  K.-H.  Brenner,  J.  Moisei,  Univ.  Erlangen-Nurnberg, 
Germany;  A.  Kirk,  H.  Thienpont,  Vrije  Univ  Brusse,  Belgium.  Tech¬ 
nologies  for  the  compact  interconnection  of  optoelectronic  devices 
are  compared.  The  performance  of  GRIN  rods  and  ion-exchange 
microlenses  and  their  suitability  for  current  and  future  devices  is 
described,  (p.  65) 

OMC12  •  Suitability  of  GRIN  rod  lenses  for  imaging  arrays  ofPnpN 
optical  thyristors  in  optoelectronic  computer  architectures,  An¬ 
drew  Kirk,  Kristel  Praet,  Hugo  Thienpont,  Vrije  Univ  Brussel,  Bel¬ 
gium;  Neil  McArdle,  Karl-Heinz  Brenner,  Univ  Eriangen-Nurnberg, 
Germany.  A  GRIN  {gradient  refractive  index)  lens  imaging  system 
for  optoelectronic  device  arrays  is  characterized.  Experimental  re¬ 
sults  are  compared  with  those  obtained  by  ray-tracing,  (p.  68) 

OMC13  •  Surface  relief  grating  array  on  GaAs  waveguides  for 
optical  spot  array  generation,  Elizabeth  J.  Twyford,  Nan  Marie 
Jokerst,  Paul  A.  Kohl,  Georgia  Institute  of  Technology;  Jhstan  j.  Tayag, 
Army  Research  Laboratory.  An  array  of  0.35-pm  period  gratings  was 
photoelectrochemically  etched  into  10  pm  x  10  pm  photolitho- 
graphically  delineated  areas.  This  device  generates  an  array  of  opti¬ 
cal  beams.  (p.  71) 

OMC14  •  Analysis  and  optimization  of  off-axis  imaging  in  planar 
optical  microsystems,  Werner  Eckert,  Univ.  Erlangen-Nurnberg, 
Germany.  The  effects  of  off-axis  imaging,  used  in  the  planar 
microintegration  approach,  are  studied  theoretically  and  experimen¬ 
tally,  and  techniques  for  the  compensation  are  investigated,  (p.  74) 

OMC15  •  Material  limitations  in  volume  holographic  copying,  Scott 
Campbell,  Yuheng  Zhang,  Pochi  Yeh,  UC-Santa  Barbara.  The  vi¬ 
abilities  of  all-optical,  quasi  all-optical,  and  hybrid  optoelectronic 
copying  of  multiple  volume  holograms  are  analyzed  based  upon 
fundamental  material  limitations,  (p.  77) 

OMC16  •  Organization  for  a  parallel  optical  memory  interface, 

Gregory  Deatz,  Miles  Murdocca,  Rutgers  Univ.  An  arbitrarily  sized 
region  of  interest  is  read  from,  or  is  written  to,  a  parallel  mass  stor¬ 
age  device  in  logarithmic  time,  in  a  concept  optically  addressed 
memory  architecture,  (p.  80) 

OMC17  •  Dynamically  interconnected  S-SEEDs,  Simon  M.  Prince, 
Frank  A.  P.  Tooley,  Mohammad  R.  Taghizadeh,  Heriot-Watt  Univ., 
U.K.  Experimental  details  will  be  presented  of  a  looped  optical  cir¬ 
cuit  interconnecting  two  S-SEED  arrays  with  a  phase  grating  written 
on  a  modified  liquid  crystal  display,  (p.  83) 

OMC18  •  Comparison  of  the  performance  characteristics  of 
Futurebus+  with  an  optical  backplane,  Tchang-hun  Oh,  Raymond 
K.  Kostuk,  Univ.  Arizona.  An  evaluation  of  delay  in  the  Futurebus+ 
architecture  indicates  that  current  electro-optic  interfaces  provides 
substantial  performance  Improvement.  These  results  are  used  in  the 
design  of  two  optical  backplane  configurations,  (p.  86) 

OMC19  •  Construction  of  a  programmable  multilayer  analogue 
neural  network  using  space  invariant  interconnects,  N.  Col  lings, 
A.  R.  Pourzand,  R.  Volkel,  Univ.  Neuchatel,  Switzerland.  An  opti¬ 
cal  multilayer  Perceptron  is  under  construction.  The  multiple  imag¬ 
ing  of  a  1 6  x  1 6  input  array  onto  a  liquid  crystal  television  screen  is 
reported,  (p.  89) 


vi 


MONDAY 


MARCH  13,  1995 


OMC20  •  Optical  bus  systems  using  a  cylindrical  lens^  Masahiko 
Mori,  Electrotechnical  Laboratory  japan.  A  new  concept  of  one-to- 
many  optical  interconnections  with  a  cylindrical  lens  is  proposed. 
The  simple  structure  achieves  a  large  signal  bandwidth  and  little 
angular  dependence,  (p.  92) 

GRAND  BALLROOM  C 

3:00pm-3:30pm 

Coffee  Break/Exhibits 

GRAND  BALLROOM  A/B 

3:30pni-5:00pm 

OMD  •  Smart  Pixels:  1 

Alexander  Sawchuk,  University  of  Southern  California,  Presider 

3:30pm  (Invited) 

OMD  1  •  Critical  issues  in  smart  pixel  design,  Marc  P.  Y.  Desmu  1 1  iez, 
John  F.  Snowdon,  Andrew).  Waddie,  Brian  S.  Wherrett,  Heriot-Watt 
Univ.,  U.K.  Trade-offs  associated  with  the  design  of  opto-electronic 
processing  pixels  are  analyzed  on  algorithmic,  electronic,  and  opti¬ 
cal  grounds.  The  sorting  task  is  chosen  as  a  practical  example. 

(p.  96) 

4:00pm 

OMD2  •  Smart-pixebbased  Viterbi  decoder,  Michael  W.  Haney, 
George  Mason  Univ.;  Marc  P.  Christensen,  BDM  Federal,  Inc.  A 
free-space  optically  interconnected  Viterbi  decoding  architecture  is 
described.  A  smart  pixel  design  and  a  proof-of-concept  demonstra¬ 
tion  are  reviewed,  (p.  99) 


4:1  5pm 

OMD3  •  Design  space  analysis  of  a  lenslet-based  optical  relaysys¬ 
teminterconnecting  smart  pixel  arrays,  D.  R.  Rolston,  B.  Robertson, 
D.  V.  Plant,  McGill  Univ.,  Canada;  H.  S.  Hinton,  Univ.  Colorado- 
Boulder.  A  design  space  analysis  is  presented  which  provides  a 
method  of  quantifying  the  relationship  between  smart  pixel  pro¬ 
cessing  power  and  a  lenslet-based  optical  interconnect,  (p,  102) 

4:30pm 

OMD4  •  Cost-performance  tradeoffs  in  optical  interconnects, 

Charles  W.  Stirk,  Univ.  Colorado.  The  performance  advantage  that 
makes  optical  interconnects  competitive  with  electronics  is  calcu¬ 
lated  for  a  given  cost  system.  The  device  defect  densities  determine 
the  ratio  of  I/O  to  logic,  (p.  105) 

4:45pm 

OMD5  •  FET-SEED  smart  pixels  for  free-space  digital  optics  sys¬ 
tems,  C.  B.  Kuznia,  A.  A.  Sawchuk,  Univ.  Southern  California;  L. 
Cheng,  Texas  Christian  Univ.  Experimental  results  from  on-going 
research  and  for  future  system  integration  plans  for  FET-SEED  smart 
pixels  in  free-space  digital  optics  systems  are  presented,  (p.  108) 

GRAND  BALLROOM  A/B 

8:00pm-1  0:00pm 

OME  •  Panel  Discussion 

"Directions  in  Optical  Computing" 

The  panelists  will  include  D.  A.  B.  Miller  (AT&T),  Richard  C. 
Williamson  (MIT  Lincoln  Laboratories),  Demetri  Psaltis  (Caltech), 
David  P.  Casasent  (Carnegie  Mellon  University),  and  A.  A.  Sawchuk 
(use).  The  discussion  will  be  moderated  by  H.  S.  Hinton  (Univer¬ 
sity  of  Colorado). 


VII 


TUESDAY 


MARCH  14,  1995 


GRAND  BALLROOM  A/B 


8:30am-1  0:00am 

OTuA  •  Optical  Design  and  Testing 

F.  A.  Tooley,  Heriot-Watt  University,  U.K.,  Presider 

8:30am  (Invited) 

OTuAI  •  Demonstration  of  a  high-speed,  multichannel,  optical 
sampling  oscilloscope,  R.  L.  Morrison,  S.  G.  Johnson,  A.  L.  Lentine, 
W.  H.  Knox,  AT&T  Bell  Laboratories.  We  demonstrate  a  video-based 
oscilloscope  that  samples  the  optical  waveforms  of  a  2D  modulator 
array  operating  at  0.5-4  Gbit/s.  This  diagnostic  tool  serves  an  im¬ 
portant  role  in  investigating  free-space  photonic  circuits,  (p.  112) 

9:00am 

OTuA2  •  Design  and  fabrication  considerations  for  construction 
of  monolithic  hybrid  optical  components  for  optical  computing 
applications,  Suzanne  Wakelin,  Matthew  W.  Derstine,  Optivision 
Inc.  Practical  issues  for  the  construction  and  utilization  of  hybrid 
bulk  and  micro-optic  components  in  smart  pixel  system  implemen¬ 
tations  are  described,  (p.  115) 

9:15am 

OTuA3  •  Universal  module  for  spliUand-join  operations  by  cas- 
cading  refractive  micro-optical  elements,  Moiseb  K.-H.  Brenner, 
Univ.  Erlangen-Nurnberg,  Germany.  We  present  a  module  which 
realizes  basic  operations  for  optical  data  processing  on  a  micro- 
optical  scale  by  cascading  only  two  different  refractive  components. 

(p.  118) 

9:30am 

OTuA4  •  Refractive  microprisms  with  improved  surface  quality  by 
proton  polishing,  Maria  Kufner,  Stefan  Kufner,  Pierre  Pichon,  Pierre 
Chavel,  CNRS,  France;  Michael  Frank,  Univ.  Erlangen,  Germany. 
High-quality  miniaturized  surface  components  can  be  fabricated 
by  deep  proton  irradiation  using  the  polishing  effect  of  protons  to  a 
PMMA  target  moving  during  the  irradiation,  (p.  121) 

9:45am 

OTuAS  •  Polarizatiomselective  diffractive  and  computer-gener¬ 
ated  optical  elements,  N.  Nieuborg,  C.  Van  de  Poel,  A.  Kirk,  H. 
Thienpont,  I.Veretennicoff,  Vrije  Univ.  Brussel,  Belgium.  Polariza¬ 
tion-selective  diffractive  and  computer-generated  optical  elements 
for  the  implementation  of  fanout  and  interconnection  operation  have 
been  fabricated  in  calcite  and  characterized  experimentally,  (p.  1 24) 

GRAND  BALLROOM  C  _ 


1  0:00am-1  0:30am 
Coffee  Break/Exhibits 

GRAND  BALLROOM  A/B 


10:30am-12:00m 

OTuB  •  Optical  Neural  Networks 

Ravi  Athale,  George  Mason  University,  Presider 

1  0:30am  (Invited) 

OTuBI  •  Photonic  implementations  of  neural  networks,  Armand 
B.  Tanguay,  Jr.,  Univ.  Southern  California.  Photonic  components  for 
densely-interconnected  neural  network  implementations  will  be 
described,  including  2D  arrays  of  individually  coherent  but  mutu¬ 
ally  incoherent  sources,  hybrid  silicon/gallium  arsenide  spatial  light 
modulators,  and  volume  holographic  optical  elements,  (p.  128) 


11:00am 

OTuB2  •  Cascaded  optical  system  for  holographic  classification  of 
temporal  signals,  C.  Garvin,  K.  Wagner,  Univ.  Colorado-Boulder. 
We  discuss  a  holographic  optical  learning  system  for  classifying 
optically  computed  features  of  arbitrarily  shifted  wide-instantaneous- 
bandwidth  temporal  signals,  and  present  experimental  results  dem¬ 
onstrating  classification,  (p.  131) 

11:15am 

OTuB3  •  Optoelectronic  morphological  processor  for  cervical 
cancer  screening,  Ramkumar  Narayanswamy,  John  P.  Sharpe,  Ri¬ 
chard  M.  Turner,  Kristina  M.  Johnson,  Univ.  Colorado-Boulder.  An 
optoelectronic  morphological  processor  has  been  designed  to 
prescreen  pap-smear  slides  by  detecting  regions  of  interest  (abnor¬ 
mal  cells)  on  the  slide  using  the  hit-or-miss  transform,  (p.  134) 

11:30am 

OTuB4  •  Robot  navigation  using  a  peristrophic  holographic 
memory,  Allen  Pu,  Robert  Denkewalter,  Demetri  Psaltis,  California 
Institute  of  Technology.  A  small  vehicle  was  navigated  in  real  time 
through  complex  paths  using  a  peristrophic  holographic  memory 
as  its  database,  (p.  137) 

1  2:00m-1 :30pni 
Lunch  on  Own 

GRAND  BALLROOM  A/B  _ _ 


1 :30pm-3:00pm 

OTuC  •  Smart  Pixels:  2 

David  A.  B.  Miller,  AT&T  Bell  Laboratories,  Presider 
1 :30pm  (Invited) 

OTuCI  *  Demonstration  of  a  dense,  high-speed  optoelectronic 
technology  integrated  with  silicon  CMOS  via  flip-chip  bonding  and 
substrate  removal,  Keith  Goossen,  A.  L.  Lentine,  J.  A.  Walker,  L.  A. 
D^Asaro,  S.  P.  Hui,  B.  Tseng,  R.  Leibenguth,  D.  Kossives,  D. 
Dahringer,  L.  M.  F.  Chirovsky,  D.  A.  B.  Miller,  AT&T  Bel!  Laborato¬ 
ries.  A  VLSI-density  silicon  CMOS/GaAs  modulator  smart  pixel 
switching  node  is  shown  operating  above  250  Mbits/sec.  The  modu¬ 
lators  are  fabricated  using  flip-chip  bonding  followed  by  substrate 
removal  that  results  in  a  composite  soldered/thin  film  technology, 
(p.  142) 

2:00pm 

OTuC2  •  Integration  ofInP-based  thin-film  emitters  and  detectors 
onto  a  single  silicon  circuit,  C.  Camperi-Ginestet,  B.  Buchanan,  S. 
T.  Wilkinson,  N.  M.  Jokerst,  M.  A.  Brooke,  Georgia  Institute  of  Tech¬ 
nology.  The  integration  of  both  InP-based  thin-film  light-emitting 
diodes  and  photodetectors  with  the  same  silicon  circuit,  which  con¬ 
tains  emitter  driver  and  photodetector  amplifier  circuits  is  reported, 
(p.  145) 

2:15pm 

OTuC3  •  InGaAs  transceivers  for  smart  pixels,  D.  T.  Neilson,  D.  J. 
Goodwill,  L.  C.  Wilkinson,  F.  A.  P.  Tooley,  A.  C.  Walker,  Heriot- 
Watt  Univ.,  U.K.;  C.  R.  Stanley,  M.  McElhinney,  F.  Pettier,  Univ. 
Glasgow,  U.K.  Improvements  and  new  device  options  for  the  de¬ 
sign  of  InGaAs  quantum  well  transceivers  for  smart  pixels  will  be 
presented,  (p.  148) 
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2:30pm 

OTuC4  •  Cascadable  thyristor  optoelectronic  switch  operating  at 
50  Mhit/s  with  7.2  femtojoule  external  optical  input  energy  Paul 
Heremans,  Bernhard  Kniipfer,  Gustaaf  Borghs,  IMEQ  Belgium; 
Maarten  Kuijk,  Roger  Vounckx,  Vrije  Univ.  Brussel,  Belgium.  Dra¬ 
matic  improvements  in  the  performance  of  differential  thyristor  pairs 
are  reported:  we  demonstrate  cascadable  operation  at  50  MHz  with 
7.2  femtojoule  external  optical  input  energy,  (p.  151) 

2:45pm 

OTuC5  •  Demonstration  of  2D  data  transcription  between  8x8 
arrays  of  completely  depleted  optical  thyristors,  Hugo  Thienpont, 
Andrew  Kirk,  Irina  Veretennicoff,  Maarten  Kuijk,  Roger  Vounckx, 
Vrije  Univ.  Brussels,  Belgium;  Paul  Heremans,  Bernhard  Knupfer, 
Gustaaf  Borghs,  IMEC,  Belgium.  We  present  the  first  demonstration 
of  optical  data  transcription  between  arrays  of  completely  depleted 
optical  thyristors,  (p.  154) 

GRAND  BALLROOM  C 

3 :00pm-3:30pm 
Coffee  Break/Exhibits 

GRAND  BALLROOM  A/B 

3:30pm 

OTuD  •  Postdeadline  Session 

John  Midwinter,  University  College,  London,  U.K.,  Presider 

GRAND  BALLROOM  C 


7:30pm“1  0:00pm 

OTuE  •  Poster  Session:  2 

OTuEl  •  Convergence  of  backward-error  propagation  learning  in 
photorefractive  crystals,  Gregory  C.  Petrisor,  Adam  A.  Goldstein, 
Edward  J.  Herbulock,  B.  Keith  Jenkins,  Armand  R.  Tanguay,  Jr.,  Univ. 
Southern  California.  We  derive  convergence  conditions  as  a  func¬ 
tion  of  learning  rate  and  weight  decay  coefficients,  spatial  light  modu¬ 
lator  gain,  and  exposure  energy,  (p.  158) 

OTuE2  •  Hybrid  electro-optic  resonator  for  image  classification, 

Robert  T.  Weverka,  Optoelectronic  Data  Systems,  Inc.;  Kelvin  H. 
Wagner,  Univ.  Colorado-Boulder.  A  resonator  using  an  acousto- 
optically  addressed  angularly  multiplexed  volume  hologram  that 
stores  a  bank  of  reference  images  achieves  high-speed,  massively 
parallel  image  recognition,  (p.  161) 

OTuE3  •  Optical  flash  analog  to  digital  converter,  Mark  J.  Prusten, 
Arthur  F.  Gmitro,  Univ.  Arizona.  A  design  for  an  optical  analog  to 
digital  converter  using  comuter-generated  holograms,  quantum  well 
SEEDS,  and  a  VCSEL  array  is  presented,  (p.  164) 

OTuE4  •  Detection  and  estimation  theoretic  accuracy  enhance¬ 
ment  in  discrete  analog  optical  processors,  Dogan  A.  Timuq:in,  John 
F.  Walkup,  Thomas  F.  Krile,  Texas  Tech  Univ.  Multiple  hypothesis 
testing  and  maximum  likelihood  and  Bayesian  parameter  estima¬ 
tion  techniques  are  employed  toward  improving  the  computational 
accuracies  of  three-plane  discrete  analog  optical  processors,  (p.  168) 

OTuE5  •  Analog  accuracy  in  optical  vector-matric processors,  James 
A.  Carter,  III,  Tim  A.  Sunderlin,  Peter  A.  Wasilousky,  Dennis  R.  Pape, 
Photonic  Systems  Inc.  This  paper  describes  techniques  to  achieve 
accurate  real-time  optical  analog  signal  generation  for  both  exter¬ 
nal  and  directly  modulated  laser  diode  sources,  (p.  171) 


OTuE6  •  High-accuracy  optical  analog  computing  implemented 
on  optical  fractal  synthesizer,  Jun  Tanida,  Wataru  Watanabe,  Yoshiki 
Ichioka,  Osaka  Univ.,  Japan.  A  method  for  high-accuracy  optical 
analog  computing  is  considered  using  interval  arithmetic  and  fixed- 
point  theory.  Two-variable  simultaneous  equations  are  studied  on 
the  optical  fractal  synthesizer,  (p.  174) 

OTuE7  •  Optimal  intensity  coding  for  digital  images  pixelated  into 
super-Gaussian  beams,  Fedor  V.  Karpushko,  Academy  of  Sciences 
of  Belarus,  Belarus.  Contrasting  to  a  binary  sequential  coding,  a 
spatially  encoded  image  with  pixels  of  super-Gaussian  profiles  in¬ 
creases  its  information  content  as  the  code  basis  goes  from  2  to 
(p.  177) 

OTuE8  •  Optical  information  processing  by  synthesis  of  the  coher¬ 
ence  function — real-time  processing  by  using  real-time  hologra¬ 
phy,  T.  Okugawa,  K.  Hotate,  Univ.  Tokyo,  Japan.  Real-time 
holography  is  adopted  in  optical  information  processing  by  synthe¬ 
sis  of  coherence  function.  Selective  extraction  of  2D  information 
from  a  3D  object  is  successfully  demonstrated  in  real-time.  (p.  180) 

OTuE9  •  Variations  of  the  hybrid  imaging  concept  for  optical  com¬ 
puting  applications,  Stefan  Sinzinger,  Jurgen  Jahns,  FernuniversitM 
Hagen,  Germany.  Hybrid  imaging  combines  standard  imaging  with 
optical  array  components.  The  physical  parameters  of  the  array  el¬ 
ements  are  used  as  design  parameters  for  new  interconnection 
schemes,  (p.  183) 

OTuEW  •  Photorefractive  optical  fuzzy-logic  processor,  Weishu 
Wu,  Changxi  Yang,  Scott  Campbell,  Pochi  Yeh,  UC-Santa  Barbara. 
A  novel  optical  fuzzy-logic  processor  for  parallel  max-mi n  opera¬ 
tions  using  volume  grating  degeneracy  in  photorefractive  crystals  is 
proposed  and  demonstrated,  (p.  186) 

OTuEII  •  Electro-optic  parallel  interfacing  for  neural  computing 
and  a  nonlinear  organic  spatial  light  modulator,  Hiroyuki  Arima, 
Ichiro  Tohyama,  Massahide  Itoh,  Toyohiko  Yatagai,  Univ.  Tsukuba, 
yapan;  Masahiko  Mori,  Electrotechnical  Laboratory,  Japan.  An  opto¬ 
electronic  interface  consisting  of  a  microlens  array,  a  photodetector 
array,  and  electronic  circuits  for  neural  computing  and  its  nonlinear 
organic  material  version  are  described,  (p.  189) 

OTuE12  •  Implementation  of  optical  logic  operations  by  micro- 
optical  cascading  of  an  array  of  differential  PnpN-thyristor  pairs, 

Karl-Heinz  Brenner,  Werner  Eckert,  Edwin  Gobel,  Neil  McArdle, 
Jorg  Moisei,  Christoph  Passon,  Univ.  Erlangen-Nurnberg,  Germany. 
Two  PnpN-thyristor  array  devices  are  cascaded  in  a  micro-optical 
imaging  system.  By  implementing  multiple  imaging,  basic  logic 
operations  are  implemented  optically,  (p.  192) 

OTuE13  •  Demonstration  of  a  laterally  inhibitive  optical  prepro¬ 
cessor  using  quantum  well  Fabry-Perot  modulators,  Brian  Kelly, 
John  Hegarty,  Paul  Horan,  Trinity  College,  Ireland;  Frank  Tooley, 
Mohammad  Taghizedah,  Heriot-Watt  Univ.,  U.K.  This  paper  de¬ 
scribes  the  construction  and  characterization  of  a  laterally  inhibi¬ 
tive  optical  network  based  on  the  self-linearizing  effect  between 
resonant-cavity  quantum  well  modulators,  (p.  195) 

OTuE14  •  Custom  optoelectronic  smart  pixel  test  station,  Suzanne 
Wakelin,  Matthew  W.  Derstine,  Kelvin  K.  Chau,  Optivision  Inc.  We 
describe  a  custom-built  optoelectronic  smart  pixel  test  station  that 
is  currently  being  used  to  characterize  devices  that  will  be  used  in 
free-space  optical  computing  systems,  (p.  198) 


ix 
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OTuEl 5  •  Limitations  of  optical  lateral  intraconnection  of  smart 
pixel  arrays^  Sunao  Kakizaki,  Paul  Horan,  Trinity  College,  Ireland, 
The  design  and  limitations  of  optically  laterally  intraconnected  pro¬ 
cessing  arrays  are  considered  and  practical  estimates  of  the  band¬ 
width  and  fanout  are  computed,  (p.  201) 

OTuE16  •  Design  and  demonstration  of  projection  and  selection 
modules  for  a  VCSEL/HPT-based  database  filter^  R.  D.  Snyder,  J.  W. 
Lurkins,  P.  J.  Stanko,  F.  R.  Beyette,  Jr.,  S.  A.  Feld,  L.  J.  Irakliotis,  P.  A. 
Mitkas,  C.  W.  Wilmsen,  Colorado  State  Univ,  The  design  and  dem¬ 
onstration  of  projection  and  selection  modules  for  an  optoelectronic 
data  filter  are  presented.  A  slotted  baseplate  is  the  platform  for  this 
design,  (p.  204) 

OTuEIT  •  Analysis  of  parasitic  front-end  capacitance  and  thermal 
resistance  in  hybrid  flip-chip-bonded  GaAs  SEE  D/Si  CMOS  receiv¬ 
ers,  R.  A.  Novotny,  A.  L.  Lentine,  D.  B.  Buchholz,  A.  V. 
Krishnamoorthy,  AT&T  Bell  Laboratories.  The  effect  of  solder-bump 
geometry  used  in  flip-chip-bonded  GaAs  SEED  photodetectors  on 
SI  CMOS  is  analyzed  theoretically  and  compared  to  measured  re¬ 
sults.  (p.  207) 


OTuEIS  •  Considerations  of  the  optical  and  optoelectronic  hard¬ 
ware  requirements  for  implementation  of  stochastic  bit-stream  neu¬ 
ral  nets,  T.  J.  Hall,  W.  A.  Crossland,  J.  S.  Shawe-Taylor,  M.  van  Daalen, 
Univ.  London,  U.K.;\N.  Peiffer,  M.  Hands,  H.  Thienpont,  Univ.  Brus¬ 
sels,  Belgium.  The  paper  addresses  the  optical  and  optoelectronic 
implementation  of  a  stochastic  bit-stream  neural  system.  Trade-offs 
between  the  use  of  optical  and  electronic  hardware  are  discussed. 

(p.210) 

OTuE19  •  Optoelectronic  fuzzy  ARTMAP  processor,  Matthias 
Blume,  Sadik  C.  Esener,  UC-San  Diego.  This  paper  describes  an 
efficient  mapping  of  the  fuzzy  ARTMAP  algorithm  onto  a  neural 
architecture  and  a  proposed  implementation  based  on  the  D-STOP 
optoelectronic  processor,  (p.  213) 


X 


WEDNESDAY 


MARCH  15,  1995 


GRAND  BALLROOM  A/B 


8:30am-1  0:00am 

OWA  •  Optical  Storage 

Sing  Lee,  University  of  California-San  Diego,  Presider 

8:30am  (Invited) 

OWA  1  •  Volume  holographic  storage  and  retrieval  of  digital  infor¬ 
mation,  Lambertus  Hesselink,  John  F.  Heanue,  Matt  C.  Bashaw, 
Stanford  Univ.  We  discuss  the  experimental  performance  of  a  digi¬ 
tal  holographic  data  storage  device  and  architectural  and  materials 
issues  related  to  achieving  large  capacity  and  low  bit  error  rates. 

(p.  218) 

9:00am 

OWA2  •  Shift-multiplexed  holographic  3D  disk,  Allen  Pu,  George 
Barbastathis,  Michael  Levene,  Demetri  Psaltis,  California  Institute 
of  Technology.  Shift  selectivities  of  a  few  microns  are  demonstrated 
theoretically  and  experimentally  using  a  novel  shift-multiplexing 
technique,  particularly  suitable  for  holographic  3D  disks,  (p.  219) 

9:1  Sam 

OWA3  •  System  issues  in  two-photon  absorption-based  3D  opti¬ 
cal  memories,  I.  Cokgor,  UC~San  Diego;  A.  S.  Dvornikov,  UC-lrvine; 
F.  B.  McCormick,  K.  Coblentz,  S.  C.  Esener,  P.  M.  Rentzepis,  Ca/// 
Recall,  Inc.  Optimum  recording  wavelength  selection,  material  fa¬ 
tigue,  memory  persistence,  and  image  uniformity  issues  in  two-pho- 
ton  absorption-based  3D  optical  memories  are  discussed,  and 
experimental  results  presented,  (p.  222) 

9:30am 

OWA4  •  Sparse-wavelength  angularly  multiplexed  volume  holo¬ 
graphic  memory,  Scott  Campbell,  Xianmin  Yi,  Pochi  Yeh,  UC-Santa 
Barbara.  Wavelength  and  angle  multiplexing  are  hybridized  in  a 
volume  holographic  memory  system,  thereby  relaxing  demands  on 
optical  sources  and  components  while  increasing  information 
throughput  rates,  (p.  225) 

9:45am 

OWA5  •  High-speed  storage  of  wavelength-multiplexed  volume 
spectral  holograms,  X.  A.  Shen,  Y  S.  Bai,  R.  Kachru,  SRI  Interna¬ 
tional.  A  spectroholographic  storage  system  for  fast  volume  holo¬ 
gram  recording  Is  demonstrated.  The  achieved  frame  transfer  rate 
exceeds  1 3  Kfps  with  random  page  access,  (p.  228) 

GRAND  BALLROOM  C 


1  0:00am-1 0:30am 
Coffee  Break/Exhibits 


GRAND  BALLROOM  A/B 
1  0:30am-1  2:00m 

OWB  •  Analog  Optical  Processing 

Terry  Turpin,  Essex  Corporation,  Presider 

1  0:30am  (Invited) 

OWB1  •  Application  of  Fourier  optics  for  defect  detection  in  mi¬ 
croelectronics  fabrications,  Lawrence  H.  Lin,  Optical  Specialties, 
Inc.  Fourier  optics  offers  a  simple  and  effective  means  for  detecting 
defects  in  the  fabrication  processes  of  semiconductor  devices  or  flat 
panel  displays.  Application  to  commercial  equipment  development 
will  be  presented,  (p.  232) 

1 1 :00am 

OWB2  •Adaptive  beam-steering  and  jammer-nulling 
photorefractive  phased-array  radar  processor,  Anthony  W.  Sarto, 
Robert  T  Weverka,  Kelvin  h.  Wagner,  Univ.  Colorado-Boulder.  An 
adaptive  beam-forming  and  jammer-nulling  optical  processor  for 
very  large  phased  arrays  has  been  designed,  analyzed,  and  experi¬ 
mentally  demonstrated  with  jammer  suppression  of  33  dB.  (p.  233) 

11:15am 

OWB 3  •  All-optical  parallel-to-serial  conversion  by  holographic 
spatial-to-temporal  frequency  encoding,  Pang-chen  Sun,  Yuri  T. 
Mazurenko,  Yeshayahu  Fainman,  UC-San  Diego.  Optical  proces¬ 
sors  that  perform  parallel-to-serial  and  serial-to-parallel  data  con¬ 
version  are  introduced  and  experimentally  demonstrated  for  long 
distance  optical  network  communications,  (p.  236) 

1 1 :30am 

OWB4  •  The  fractional  Fourier  transform  in  optics:  do  we  need  it? 
is  it  useful?  Adolf  W.  Lohmann,  Weizmann  Institute  of  Science,  Is¬ 
rael;  David  Mendlovic,  Tel-Aviv  Univ.,  Israel;  Haldun  M.  Ozaktas, 
Bilkent  Univ.,  Turkey.  We  re-invented  this  tranform  as  a  specula¬ 
tion.  We  realized  the  basic  equivalence  with  other  optical  trans¬ 
forms.  It  was  useful  for  us  at  eight  occasions,  (p.  239) 

11 :45am 

OWBS  •  Optical  wavelet  processor  for  target  detection,  Tien-Hsin 
Chao,  Araz  Yacoubian,  Brian  Lau,  Jet  Propulsion  Laboratory;  Will¬ 
iam  J.  Miceli,  Office  of  Naval  Research.  We  describe  two  innova¬ 
tive  techniques  for  optical  synthesis  of  two  types  of  wavelets  using 
liquid  crystal  television  spatial  light  modulators  (LCTV  SLMs):  a  2D 
Morlet  wavelet  and  a  ternary-valued,  shape-discriminant  wavelet 
LCTV  SLMs.  The  2D  Morlet  wavelet  is  synthesized  using  two  SLMs 
for  continuous  amplitude  and  binary  phase  modulation.  The  ter¬ 
nary  wavelet  is  synthesized  using  only  a  single  SLM.  These  wavelet 
filters  have  also  been  inserted  into  an  optical  correlator  and  dem¬ 
onstrated  for  target  detection  with  improved  discrimination  over  that 
of  the  conventional  correlation  using  a  matched  filter,  (p.  242) 

12:00m 
Lunch  on  Own 
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WEDNESDAY 


MARCH  IS,  1995 


GRAND  BALLROOM  A/B 


1 :30pm-3:00pm 

LWA  •  Joint  Session  with  Spatial  Light  Modulators 

John  N.  Lee,  U.S.  Naval  Research  Laboratory,  Presider 

1 :30pm  (Plenary) 

LWA1  •  Future  directions  in  "smart"  quantum  well  SLMs  and  pro¬ 
cessing  arrays,  David  A.  B.  Miller,  AT&T  Bell  Laboratories.  Integra¬ 
tion  of  arrays  of  high-speed  quantum  well  modulators  with 
electronics,  including  hybrid  integration  with  silicon,  may  exploit 
the  best  of  optics  and  electronics. 

David  Miller  received  a  B.Sc.  in  Physics  from  St.  Andres  Univer¬ 
sity,  and  performed  his  graduate  studies  at  Heriot-Watt  University 
where  he  was  a  Carnegie  Research  Scholar.  After  receiving  the  Ph.D. 
degree  in  1979,  he  continued  to  work  at  Heriot-Watt  University  as 
a  Lecturer  in  the  Department  of  Physics.  He  moved  to  AT&T  Bell 
Laboratories  in  1 981  as  a  Member  of  the  Technical  Staff,  and  since 
1987  has  been  a  Department  Head;  currently  of  the  Advanced 
Photonics  Research  Department.  His  research  interests  include  op¬ 
tical  switching  and  processing,  nonlinear  optics  in  semiconductors, 
and  the  physics  of  quantum-confined  structures.  He  has  published 
over  1 70  technical  papers  and  4  book  chapters,  delivered  over  50 
conference  invited  talks  and  over  20  short  courses,  and  holds  more 
than  30  patents,  (p.  248) 

2:15pm  (Plenary) 

LWA2  •  Device-architecture  interaction  in  optical  computing, 

Ravindra  A.  Athale,  George  Mason  Univ.  Device  technologies  and 
processor  architectures  exert  a  strong  influence  on  each  other  in 
optical  computing.  I  will  discuss  examples  of  successful  and  unsuc 
cessful  interactions  between  these  two  communities.  The  role  of 
the  CO-OP  in  enhancing  this  interaction  will  be  outlined. 

Ravi  Athale  received  his  B.Sc.  and  M.Sc.  degrees  in  Physics  from 
Bombay  University  and  IIT/Kanpur,  respectively.  He  did  his  Ph.D. 
thesis  work  in  Digital  Optical  Computing  at  University  of  Califor¬ 
nia,  San  Diego.  He  worked  at  Naval  Research  Laboratory  and  BDM 
International  before  joining  George  Mason  University  faculty.  He  is 
a  Fellow  of  Optical  Society  of  America  and  a  member  of  SPIE  and 
lEEE/LEOS.  (p.  249) 

GRAND  BALLROOM  C  _ 


3:00pm-3:30pm 
Coffee  Break/Exhibits 


GRAND  BALLROOM  A/B 


3:30pm-5:00pm 

owe  •  Joint  Plenary  Session  with  Spatial  Light 
Modulators 

Demetri  Psaltis,  California  Institute  of  Technology,  Presider 

3:30pm  (Plenary) 

OWC1  •  History  of  optical  computing:  a  personal  perspective, 
Adolf  W.  Lohmann,  Weizmann  Institute  of  Science,  Israel.  The  value 
of  optics  for  signal  transport  and  electronics  for  signal  interaction 
will  be  discussed  as  well  as  how  optical  signal  processing  is  in¬ 
structive  for  optical  computing,  (p.  252) 

4:15pm  (Plenary) 

OWC2  •  Acoustic  signal  processing  with  photorefractive  optical 
circuits,  Dana  Z.  Anderson,  Univ.  Colorado-Boulder.  Self-organized 
learning  of  temporal  sequences  is  implemented  with  a 
photorefractive  oscillator  having  a  time  delay  element  in  the  feed¬ 
back  path.  (p.  255) 

GRAND  BALLROOM  C  _ 


6:30pm-8:00pm 
Conference  Reception 
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THURSDAY 


MARCH  16,  1995 


GRAND  BALLROOM  A/B 


8:30ani-1  0:00am 

OThA  •  Interconnection:  1 

Joseph  W,  Goodman,  Stanford  University,  Presider 

8:30am  (Invited) 

OThA1  •  Implementation  of  optical  clock  distribution  in  a 
supercomputer,  Dave  Kelfer,  Vernon  W.  Swanson,  Cray  Research, 
Inc,  An  application  for  optical  clock  distribution  in  a  500  mega¬ 
hertz  supercomputer  system  has  been  demonstrated.  This  paper 
describes  the  challenges  in  implementing  this  distribution  system. 

(p,  260) 

9:00am 

OThA2  •  Design  of  electropbotonic  computer  networks  with 
nonblocking  and  self-routing  functions,  Shigeru  Kawai,  Hisakazu 
Kurita,  Optoelectronics  NEC  Laboratory,  Using  free-space  optics  and 
VCSELs,  WDM  and  SDM  switches  are  proposed  for  achieving  three- 
stage  optical  networks  which  have  the  same  functions  as  crossbar 
switches,  and  up  to  IK  channels  scalability,  (p.  263) 

9:1  Sam 

OThA3  •  Collisionless  wavelength-division  multiple  access  proto¬ 
col  for  free-space  cellular  hypercube  parallel  computer  systems, 

Kuang-Yu  J.  Li,  B.  Keith  Jenkins,  Univ,  Southern  California.  Dense 
communication,  multiple  access,  simple  control,  and  improved 
network  throughput  and  packet  delay,  can  be  achieved  by  incorpo¬ 
rating  space,  wavelength,  and  time  multiplexing,  (p.  266) 

9:30am 

OThA4  •  Optoelectronic  communication  speedup  on  mesh  pro¬ 
cessors  using  reduced  cellular  hypercube  interconnections,  J.-F. 

Lin,  A.  A.  Sawchuk,  Univ,  Southern  California,  Optoelectronic  re¬ 
duced  cellular  hypercube  interconnections  significantly  improve  the 
processor  communication  efficiency  of  mesh-connected  array  pro¬ 
cessors.  Performance  improvements  for  some  common  operations 
and  applications  are  discussed,  (p.  269) 

9:45am 

OTbAS  •  16-channel  FET-SEED-based  optical  backplane  intercon¬ 
nection,  D.  V.  Plant,  B.  Robertson,  G.  C.  Boisset,  N.  H.  Kim,  Y.  S. 
Liu,  M.  R.  Otazo,  D.  R.  Rolston,  A.  Z.  Shang,  McGill  Univ.,  Canada; 
H.  S.  Hinton,  W.  M.  Robertson,  Univ.  Colorado-Boulder.  The  de¬ 
sign  and  operation  of  a  representative  portion  of  a  bidirectional  free- 
space  photonic  backplane  is  described,  (p,  272) 

GRAND  BALLROOM  C 


1  0:00am-1  0:30am 
Coffee  Break/Exbibits 


GRAND  BALLROOM  A/B 

1  0:30am-1  2:10pm 

OThB  •  Interconnections:  2 

Mike  Feldman,  University  of  North  Carolina,  Presider 

1  0:30am  (Invited) 

OThBI  •  Interconnection  theory  and  optoelectronic  computing 
architectures,  Haldun  M.  Ozaktas,  Bilkent  Univ,,  Turkey.  Various 
optically  interconnected  computer  architectures  are  compared  based 
on  a  number  of  considerations  including  interconnection  density 
and  heat  removal,  (p.  276) 

11:00am 

OThB2  •  High-density  300-Gbps/cm^  parallel  free-space  optical 
interconnection  design  considerations.  Dean  Z.  Tsang,  MIT  Lin¬ 
coln  Laboratory.  A  high-density,  high-throughput  SOO-Gbps/cm^ 
parallel  free-space  optical  interconnection  has  been  designed  and 
demonstrated.  The  impact  of  optical,  electrical,  mechanical,  and 
thermal  issues  is  described,  (p.  277) 

11:15am 

OTbB3  •  Weighted  space-variant  local  interconnections  based  on 
micro-optic  components:  cross  talk  analysis  and  reduction, 

Chingchu  Huang,  B.  Keith  Jenkins,  Charles  B.  Kuznia,  Univ.  South¬ 
ern  California.  A  cross  talk  reduction  method  for  a  fixed-weight  neu¬ 
ral  network  optical  interconnection  system  based  on  diffractive 
optical  element  design  techniques  is  presented  and  simulated. 

(p.  280) 

1 1 :30am 

OThB4  •  Optical  transpose  interconnection  system:  system  de¬ 
sign  and  component  development,  W.  Lee  Hendrick,  Philippe  J. 
Marchand,  Frederick  B.  McCormick,  llkan  (Jokgor,  Sadik  C.  Esener, 
UC-San  Diego.  The  optical  transpose  interconnection  system  sup¬ 
ports  shuffle,  mesh-of-trees,  and  hypercube  architectures.  A  com¬ 
puter  design  of  the  optics,  a  novel  beam-splitting  component,  and  a 
complete  optoelectronic  system  model  are  presented,  (p.  283) 

1 1 :45am 

OThBS  •  Applications  of  fiber-image  guides  to  bit-parallel  optical 
interconnects,  Yao  Li,  Ting  Wang,  NEC  Research  Institute;  H.  Kosaka, 
S.  Kawai,  K.  Kasahara,  NEC  Corp.,  Japan.  We  propose  and  experi¬ 
mentally  demonstrate  novel  applications  of  fiber-image  guides  to 
bit-parallel  optical  interconnects  for  digital  processors.  Advantages 
and  challenges  of  this  technology  will  also  be  discussed,  (p.  286) 

GRAND  BALLROOM  A/B 

1  2:00pm 

Closing  Remarks 

Kelvin  H.  Wagner,  University  of  Colorado-Boulder 
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Monday,  March  13,  1995 


Optical  Computing 

Systems 


OMA  8:30  am-10:00  am 
Grand  Ballroom  A/B 


H.  Scott  Hinton,  Presider 
University  of  Colorado- Boulder 
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Optoelectronics  Technology  for  Real  World  Computing 

Hiroyoshi  Y^ima 
Electrotechnical  Laboratory 

1-1-4,  Umezono,  Tsukuba,  305  JAPAN 


1.  Research  Framework 

Light  is  expected  to  be  a  new  information  medium,  beause  of  its  extended  transmission 
capacity  and  massively  parallel  processing  capability.  Optics  provides  new  device 
technology  as  well  as  new  architectures  and  algorithms  in  the  Real  World  Computing 
Program,  which  aims  at  flexible  information  processing  using  massively  parallel  and 
massively  distributed  processing. 

Optical  technology  to  be  developed  in  the  program  is  classified  into  three  categories. 
•Optical  interconnection 
•Optical  neural  systems 
•Optical  digital  systems 

Optical  interconnection  aims  at  overcoming  the  so  called  "wiring  lunit"  which  electronic 
systems  are  now  confronting.  Optical  interconnection  is  also  the  key  technology  for 
realizing  optical  neural  systems  and  optical  digital  systems. 

Optical  neural  systems  aim  at  realizing  real-time  learning  and  associative  processing  of 
images  and  other  distributed  data  by  connecting  neurons  with  light. 

Optical  digital  systems  aim  at  realizing  massively  parallel  processing  with  computational 

accuracy  using  light. 

Figure  1  shows  research  subjects  on  optical  conputing  in  RWC  program. 

Optical  computing  technologies  are  based  on  the  presumption  of  using  newly  developed 
optical  devices.  Modularization  of  optoelectronic  devices  is  also  an  important  goal  of  the 
Real  World  Computing  Program. 

2.  Research  Ibpics 

a)  Optical  interconnection 

Optical  interconnection  merges  the  advanced  electronics  technology  that  is  represented  by 
VLSI  with  optical  communication  technology  to  eliminate  information  transmission  problems 
in  electronic  systems,  such  as  propagation  delay,  line  to  line  crosstalk,  space  factors  of 
wiring  and  mounting,  and  large  power  consumption. 

Optical  interconneciton  aims  at  superseding  the  above  limits,  and  offer  high-speed,  large 
capacity  and  flexible  information  transmission. 

In  order  to  develop  optical  interconnection,  the  following  issues  are  important. 

Optical  intercoimection  devices  realizes  high-speed,  large  capacity  and  reconfigurable 
interconnection  networks,  using  high-density  multiplexing  technologies  in  the  area  of  time, 
space  and  wavelength.  The  developments  of  ultrafast  optical  interconnection  devices,  space 
parallel  optical  interconnection  devices,  wavelength  parallel  devices,  and  passive  optical 
elements  are  required. 

Optical  interconnection  network  architecture  and  design  technology  of  interchip  and 
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intrachip  optical  interconnection  should  be  developed. 

b)  Optical  neural  systems 

Optical  neural  systems  realize  the  real-time  processing  of  images  and  other  spatially 
distributed  information  or  spectral  information  through  learning  and  associative  processing, 
using  massive  and  flexible  interconnectivity  of  light.  To  develop  such  systems,  the 
following  issues  are  important. 

The  establishment  of  optical  neural  model,  such  as  direct  image  processing  model,  the 
model  based  on  physical  phenomena  of  light,  expandable  modular  model,  and  the  model 
suitable  for  analog  devices  are  required. 

The  development  of  optical  neural  devices  with  large-scale  and  high-speed  learning 
function,  and  the  devices  with  direct  recognition,  processing  and  feature  extracting  functions 
of  imput  image  are  required. 

c)  Optical  digital  systems 

Optical  digital  systems  realize  massively  parallel  and  accurate  processing  of  images  and 
other  spatially  distributed  information  or  spectral  information  with  logical  computation 
principle,  using  massive  and  flexible  connectivity  of  light,  lb  develop  such  systems,  the 
following  issues  are  important. 

The  development  of  optical  logic  devices,  such  as  ultrafast  optical  logic  devices,  space 
parallel  optical  logic  devices,  wavelength-parallel  optical  logic  devices  and  other  peripheral 
passive  optical  devices  are  required. 

Hie  development  of  optical  logic  circuits,  such  as  optical  interconnection  between  optical 
logic  devices,  and  between  optical  functional  modules  are  required. 

The  establishment  of  architecture  and  design  technology,  input-output  interface  and 
programing  language  are  also  required. 

3.  Joint  Optoelectronics  Project 

In  spite  of  potentialities  of  light  as  a  medium  of  information,  the  device  technologies  for 
optical  comptuting  is  immature.  The  RWC  program  are  intended  to  form  the  common 
platform  for  expecting  exchanges  between  the  group  of  optical  computing  architectures  and 
the  group  of  optical  devices. 

Japan  and  US  will  start  a  joint  optoelectronics  project(JOP)  from  this  year  to  provide  a 
prototyping  service  for  experimental  devices  and  modules  in  optoelectronics  as  an  integral 
part  of  the  RWC.  It  stimulates  R&D  activity  in  optoelectronics  for  computing  in  both 
countries  and  encourages  effective  commercialization.  Figure  2  shows  the  scheme  of  JOP. 

The  broker  office  is  funded  in  both  countries  and  they  take  the  role  to  serve  as  the 
facilitator  beween  the  User  who  has  a  novel  design  to  be  fabricated,  and  the  Suppliers  who 
perform  fabrication. 


Reference 


Japan  Computer  Quarterly,  JIPDEC,  No.89,  1992 
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Fig.2  Joint  Optoelectronics  Project  (JOP) 
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Implementation  of  a  lu-channel  Sorting  Module 
Douglas  A.  Baillie,  Frank  A.P.  Tooley,  Simon  M.  Prince, 

Nicola  L.  Grant,  Julian  A.B.  Dines,  Marc  P,Y.  Desmulliez  &  Mohammad  R.  Taghizadeh 
Department  of  Physics,  Heriot-Watt  University,  Edinburgh,  EH14  4AS  UK 
Tel:  +44  31  451  3056,  fax:  3136  e-mail:  phydabl^clust.hw.ac.uk 

This  paper  will  present  experimental  details  of  a  sorting  module  demonstration  system.  The  sorting 
module  which  is  currently  under  construction  is  shown  as  a  functional  schematic  in  figure  1.  Figure  2  is  a 
photograph  of  the  optics.  The  system  implements  the  bitonic  sort  based  on  Batcher’s  algorithm  implemented 
with  a  perfect  shuffle.  A  re-circulating  rather  than  pipelined  arrangement  is  used  to  minimise  hardware 
requirements  to  2  smart  pixel  chips; 

•  a  sorting  node  array  (self-routing  exchange/bypass  nodes),  and 

•  shift  register  array  which  acts  as  the  input/output  interface. 

The  system  currently  under  construction  is  a  16  channel  8-bit  word  sorting  module  based  on  1  pm 
CMOS  and  InGaAs  modulator/detectors.  InGaAs  grown  on  a  GaAs  substrate  is  used  in  preference  to 
GaAlAs  since  at  the  operation  wavelength  of  1.064  pm,  the  substrate  is  transparent  which  simplifies  flip-chip 
assembly. 

It  is  anticipated  that  this  system  will  be  operating  in  March  1995.  The  16-channel  dual-rail  system 
acts  as  a  test-bed  for  a  later  system  which  uses  0.7  pm  CMOS  and  an  array  of  32x32  channels.  The  folded 
perfect  shuffle  interconnect  required  is  implemented  using  two  fan-out  to  2  binary  phase  gratings  and  a  2 
times  telescope  composed  of  custom-designed  42  mm  and  21  mm  efl  lenses. 

The  output  of  the  sorting  nodes  is  shuffled  and  input  to  an  8-bit  wide  shift  register  array,  the  output 
of  which  is  the  new  input  to  the  sorting  node.  The  input  pattern  is  loaded  into  the  shift  register  electrically 
and  the  sorted  data  is  output  similarly  and  simultaneously.  It  is  required  that  a  control  bit  be  loaded  into  each 
node  at  the  start/end  of  each  exchange/bypass  operation.  This  selects  whether  a  higher  or  lower  valued  word 
is  to  be  routed  to  the  upper  output  of  the  exchange/bypass  nodes.  A  spatially-variant  and  cycle-variant 
control  pattern  is  required  to  set  the  state  of  the  sorting  nodes.  In  the  1024-channel  system,  512  channels  of 
control  information  need  to  be  provided  at  100  ns  intervals.  This  is  an  aggregate  control  data  rate  of  5  GHz. 
To  avoid  this  formidable  requirement,  the  bitonic  sort  algorithm  has  been  manipulated  so  that  this  can  be 
achieved  by  optically  embedding  the  control.  The  first  bit  detected  by  a  sorting  node  sets  its  state; 
subsequent  bits  represent  the  data  to  be  shuffled[l]. 

Sorting  Node 

The  sorting  node  CMOS  circuit  has  initially  been  designed  in  1  pm  CMOS  and  preliminary  versions 
of  it  tested  electrically  at  30  MHz  limited  by  the  testing  technique.  In  HSPICE  simulations,  it  works  reliably 
at  a  clock  frequency  of  100  MHz.  Work  is  proceeding  to  increase  this  through  the  use  of  a  pipeline  circuit 
design  and  the  use  of  0.7  pm  CMOS.  The  self-routing  exchange/bypass  switches  in  10  ns.  There  is  latency  as 
the  'packet'  of  8  bits  has  to  be  processed  and  be  stored  in  the  shift  register.  A  latch  reset  and  control  load 
operation  added  to  the  8-bit  word  therefore  result  in  10  clock  cycles  between  sorting  operations.  An 
exchange/bypass  decision  on  each  word  is  made  every  100  ns.  The  algorithm  requires  (log2M)^-log2M+l 
iterations  to  sort  M  words.  Therefore  there  is  a  time  of  (log2l6)^-  log2l6+l  times  100  ns  before  a  new 
input  can  be  sorted.  This  is  equal  to  1 .3  ps. 

Input/Output 

The  input  must  provide  eight  sets  of  16  binary  images  at  10  ns  intervals,  once  every  1.3  ps.  The 
output  must  extract  the  same  amount  of  data.  The  information  is  loaded  into  and  out  of  the  shift  registers 
along  4  tracks  operating  at  100  MHz.  This  takes  4x8x10  ns  =320  ns.  The  aggregate  input  (and  output)  data 
rate  is  79  Mbit/s  (8x16  bits/1 .62  ps). 
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Power  Budget  u  j 

Consider  the  optical  power  required  to  read  the  shift  register  array  and  write  to  the  sorting  node. 

Assume  binary  phase  grating  losses  of  30%,  losses  in  polarizing  optics,  etc.  of  50%  and  a  differential 
InGaAs  spatial  light  modulator  that  modulates  from  40%  to  10%  The  perfect  shuffle  loss  as  explained 
below  is  60%. 

The  sensitivity  of  the  receiver  is  critical  to  the  operation  of  the  system.  It  is  desirable  to  use  the 
minimum  laser  power  on  the  modulators  as  the  photocurrent  generated  there  is  the  major  source  of  thermal 
load.  Thermal  considerations  limit  the  maximum  pixel  count,  the  operation  speed  and  the  pixel  density. 
Minimizing  the  laser  power  is  advantageous  since  it  is  envisaged  that  a  high  power  laser  source  will  be  the 

least  reliable  and  most  expensive  component  in  any  system. 

The  receiver  used  for  both  smart  pixel  arrays  is  a  3 -stage  transimpedance  amplifier.  A 
transimpedance  amplifier  is  appropriate  as  the  detector  capacitance  has  less  influence  upon  the  receiver 
speed  or  sensitivity.  Consequently,  the  detector  area  can  be  large  which  minimizes  the  demands  put  on  the 
optical  system  and  the  optomechanics.  For  example,  a  20  pm  diameter  detector  can  be  used  and  a  10  pm 
beam.  The  beam  can  be  generated  by  a  compact  and  inexpensive  f/4  lens.  Alignment  stability  to  a  precision 

of  only  10  pm  is  required  which  is  readily  accomplished. 

Another  advantage  of  a  transimpedance  amplifier  is  that  its  sensitivity  can  be  chosen  to  match  that 
required  by  making  the  circuit  more  or  less  complicated.  The  size  of  the  receiver  may  be  of  concern  if  a 
small  pixel  pitch  is  required.  The  power  consumption  of  the  circuit  also  has  to  be  considered  carefully.  As 
the  modulator  thermal  load  decreases,  that  due  to  the  receiver  increases.  The  optimum  complexity  of 
receiver  design  will  depend  on  system  speed  and  noise  sources. 

HSPICE  simulations  indicate  that  the  sensitivity  of  the  receiver  is  sufficient  for  >100  MHz 
operation  with  a  differential  input  power  of  only  6  pW.  This  is  a  ’switching  energy'  of  60  £1.  With  this  power 
required  on  the  detectors,  there  would  be  around  100  pW/beam  incident  on  the  modulators.  The  laser  power 
required  for  1024  devices  is  300  mW.  Higher  contrast  modulators  would  allow  single-rail  operation.  With  a 
contrast  of  8:1,  a  small  improvement  in  the  optical  losses(50%  dropping  to  25%)  and  a  doubling  of  the 
receiver  sensitivity,  the  laser  power  required  would  decrease  to  only  15  mW. 

Thermal  Load 

The  thermal  load  on  the  chip  is  significant  since  uniform  temperature  is  required  over  the  chip. 
Joule  heating  will  occur  due  to  photocurrent.  Assuming  a  modulator  responsivity  of  0.5  AAV,  the  total 
thermal  load  due  to  the  presence  of  light  is  less  than  0.3  mW/diode.  An  HSPICE  simulation  of  the  power 
dissipation  of  an  array  of  this  circuit  showed  that  around  3  mW/node  would  be  generated.  The  total  load  of 
the  1024-channel  system  of  around  2  W  is  such  that  the  temperature  can  be  maintained  uniform  over  the 
entire  array  at  the  required  operation  temperature  of  the  modulators. 

Optical  Considerations 

The  folded  perfect  shuffle  interconnect  is  implemented  as  a  modified  form  of  the  segmented-2- 
shuffle  proposed  by  Cloonan  et  al{2].  The  shuffle  has  been  split  into  two  fanout  by  2  stages  as  shown  in 
figure  3.  This  is  advantageous  as  the  optical  loss  is  balanced  between  the  two  links  of  the  optical  circuit. 
Between  the  sorting  nodes  and  the  shift  register  array  for  the  16-channel  system  is  a  2  times  telescope(figure 
4).  Thus  the  shift  register  pixels  are  200  by  400  |.im  while  the  sorting  node  pixels  are  200  pm  square. 

Both  smart  pixel  arrays  have  been  designed  with  20  pm  windows,  so  only  low-power  lenses  are 
required.  The  32x32  array  requires  an  optical  system  with  a  field  of  only  9  mm  (diagonal).  This  is  within  the 
capabilities  of  the  custom  lenses  we  have  already  designed  and  constructed.  It  is  anticipated  that  results  of 
the  operation  of  the  16-channel  system  will  be  presented  at  the  meeting. 
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1  Introduction 

For  many  years,  optics  has  been  cited  as  the  technol¬ 
ogy  of  choice  for  communication.  In  applications  such 
as  telephones,  cable  TV,  and  local  area  networks,  it 
seems  that  optical  communication  has  proved  to  have 
technological  and  economical  advantages.  However,  in 
the  area  of  parallel  processing,  optics  has  not  yet  made 
an  impact,  despite  a  number  of  efforts  to  demonstrate 
various  concepts  and  principles  with  extensive  experi¬ 
ments  and  prototyping.  It  seems  that,  so  far,  a  conclu¬ 
sive  opinion  has  not  been  reached.  Parallel  processing 
systems  still  use  electronic  networks  for  inter-processor 

communication.  ^  .  i  i.  4.- 

In  this  paper  we  present  a  novel  approach  to  opti¬ 
cal  interconnections  for  parallel  processing.  We  use  the 
term  “MOF  (Massive  Optical  Interconnections)  to  em¬ 
phasis  the  scalability  of  our  approach,  to  match  current 
and  near  future  MPP  (Massively  Parallel  Processing) 
systems.  We  present  a  new  network  structure  and  op¬ 
eration  mode  that  has  been  tailored  specifically  to  take 
advantage  of  the  benefits  of  optical  technology  without 
being  limited  by  its  drawbacks.  We  present  the  results 
of  an  effort  to  design  and  build  a  prototype  of  this  new 
network  and  the  tradeoffs  made.  Finally  we  describe 
future  plans  for  a  larger  network  prototype,  to  be  used 
in  a  real  MPP  system. 

2  Network  Architecture 

In  this  section  we  describe  the  operating  principles 
of  our  novel  network.  Referring  to  Figure  1  we  describe 
the  following  principles  of  operation: 

Routing  in  optics:  Electronic  VLSI  technology  offers 
better  cost  effective  solutions  for  “processing” .  Rout¬ 
ing  functions  in  a  network,  requires  extensive  logical 
operations,  as  well  as  memory  (buffers).  Optics  cannot 
yet  compete  in  the  implementation  of  such  functions 
directly  so  these  should  be  avoided  in  an  optical  net- 

Switching  elements:  The  switching  (selection)  of 
destinations  is  to  be  done  at  the  source,  by  activating 
one  VCSEL  in  the  array.  The  specific  VCSEL  chosen 
corresponds  to  a  particular  destination. 

Power  dissipation:  If  only  one  (or  a  few)  of  the 
VCSELs  in  an  array  are  active  at  any  given  time,  the 
overall  power  dissipation  is  low.  Such  a  system  may  be 
scalable  to  thousands  of  VCSEL  devices  per  array. 
Distributed  3-D  layout:  The  MPP  system  is  made 
from  processing  boards  [clusters).  Each  board  has  a 
small  number  of  processors  interconnected  by  a  fast 
electronic  network.  The  optical  network  is  distributed 
over  the  3-D  volume  of  the  system.  In  the  future,  a 
processing  element  may  be  made  of  only  one  chip  (in¬ 
cluding  memory,  CPU  and  communication),  thus  high 
integration  and  even  distribution  is  an  advantage. 
Opto-Electronic  issues:  If  the  processing  is  done  in 
electronics,  and  communication  is  done  by  optical,  free- 
space  technology,  the  minimum  number  of  E-0  and  0- 


Figure  1:  The  Interconnection  Cache  Network. 

E  conversions  is  two.  The  I/O  pin  number  bottleneck 
at  the  conversion  points  does  not  exist  if  only  one  VC¬ 
SEL  (or  a  few)  are  to  be  operational  in  the  VCSEL 
array.  For  detection,  only  one  photo  receiver  is  associ¬ 
ated  with  every  processor,  so  only  a  small  number  of 

receivers  are  needed.  j  r 

Network  operation  mode:  A  reconhgurable  mode  oi 

operation  is  used  to  avoid  optical  routing,  more  than 
two  opto-electronic  conversions  and  delays  due  to  rout¬ 
ing  contentions.  The  fast  routing  is  provided  by  elec¬ 
tronic  “interconnection  cache”  switches  (small  cross¬ 
bars)  that  are  placed  between  the  processors  and  the 
free-space  optical  network,  on  each  board. 

Parallel  applications:  Many  parallel  applications  ex¬ 
hibit  “switching  locality”  in  their  communication  re¬ 
quirements,  thus  can  easily  be  mapped  into  such  a  com¬ 
bined  (circuit  switching  and  packet  switching)  mode  of 
network  operation. 

Main  roles  of  optics:  Connectivity  (for  MPP  sys¬ 
tems)  and  bandwidth. 

For  more  details  on  issues  related  to  the  optical  net¬ 
work  architecture  refer  to  the  following:  overview  of 
the  network  and  MPP  architecture  [1,2],  performance 
studies  [3,4],  application  studies  [5,6],  and  possible  pro¬ 
cessor  implementation  [7]. 

3  Optical  Design 

This  section  presents  the  design  of  the  optical  net¬ 
work  prototype  based  on  the  previously  described  prin¬ 
ciples.  The  experimental  system  layout  is  shown  in  fig¬ 
ure  2.  It  has  64  optical  channels,  for  interconnecting 
64  PEs  (Processing  Elements)  arranged  as  4  columns, 
each  having  1  board  (‘cluster’)  of  16  PEs.  The  con¬ 
nections  of  four  8x8  VCSEL  arrays  and  four  4x4  fiber 
arrays  are  also  shown  in  the  figure. 

Up  to  M  —  1  sources  may  send  to  one  receiver 
(where  M  is  the  number  of  boards).  One  optical  chan¬ 
nel  is  allocated  per  receiver,  into  which  the  M  1  VC¬ 
SELs  couple  by  using  optimized  partially  reflecting  mi- 
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Figure  2:  Schematic  view  of  the  64  channel  system. 

cromirrors.  Since  the  various  channels  across  the  array 
are  at  different  distances  from  their  destinations,  differ¬ 
ing  mirror  reflectivities  are  used.  Arrays  of  micromir¬ 
rors  are  placed  between  2  prisms  to  form  beamsplit¬ 
ting/combining  cubes.  The  individual  beams  within  a 
cube  remain  spatially  separate,  in  the  form  of  small 
^microbeams’.  Beam  diffraction  limits  the  number  of 
microbeam  channels  that  can  be  supported  within  one 
cube.  However,  typical  maximum  numbers  are  more 
than  sufficient  for  network  sizes  of  >  10000  channels. 
Required  beam  spacings  are  also  compatible  with  VC- 
SEL  array  element  spacings,  and  commercially  avail¬ 
able  microlens  arrays  (typically  250um). 

Board  to  board  spacings  within  a  column  would  be 
on  the  order  of  s  ^  30mm.  Relaying  the  complete  ar¬ 
ray  of  channels  from  board  to  board  would  require  a  2- 
lens,  4-f  relay,  needing  impractical  lenses  of  focal  length 
s/4:  ^  7mm  operating  with  10mm  of  field  for  large 
networks.  Instead,  we  use  microlens  arrays,  on  axis, 
with  very  low  NAs.  Only  one  lens  array  is  needed  be¬ 
tween  boards,  with  focal  length  s/2  15mm  and  typ¬ 
ical  aperture  250^m.  The  distances  and  beam  diame¬ 
ters  may  be  adjusted  for  the  unique  condition  in  which 
the  diffraction-limited  beam  waists  at  ±/  are  equal, 
enabling  both  transmitted  and  newly  combined  beams 
to  propagate  identically  with  only  one  relay  microlens 
between  stages.  Clearly,  with  typical  microbeam  diam¬ 
eters  at  the  lens  of  150/i,  and  focal  length  12.5mm,  we 
have  an  inherently  low  aberration  relay  system.  Ray¬ 
tracing  simulations  of  the  wavefront  suggest  that  man¬ 
ufacturing  and  alignment  errors  will  dominate  over  the¬ 
oretical  microlens  aberrations. 

Inter-column  distance  is  on  the  order  oiS  ^  300mm, 
making  bulk  lenses  more  appropriate.  These  are  used 
in  2-lens,  4-f  configurations  to  maintain  the  parallelism 
of  the  microbeams.  The  bulk  lens  focal  length  is 
F  75mm,  so  individual  beams  have  extremely  small 
NA’s.  This  results  in  only  negligible  aberrations  added 
by  the  lenses.  The  main  concern  is  spherical  aberration 
which  will  cause  position  and  pointing  errors  in  beams 
far  from  the  center  of  the  array.  However,  commercial 
doublets  corrected  for  spherical  aberration  can  perform 
well  in  our  system  for  many  columns  of  relay  and  large 
array  sizes,  as  shown  by  figure  3.  This  figure  shows 
the  worst  case  microbeam  position  errors  vs.  field  (ar¬ 
ray  position)  for  sixteen  80mm  doublets  relaying  over 
8  column-to-column  distances  (equivalent  to  a  worst 
case  beam  path  for  a  7  column  system).  The  4  critical 
positions  for  errors  are  at  the  front  focal  point  and  the 
lens  surface  of  2  successive  microlenses  (f=14mm).  The 
results  are  acceptable  errors  of  lO^m  or  less  for  10mm 
field,  equivalent  to  using  14mm  micromirror  cubes.  The 
maximum  number  of  channels  will  thus  be  in  the  range 


80mm  doublet,  x16, 843nm 


Field  (mm) 

Figure  3:  16  stage  bulk  lens  relay  errors. 

7000.  This  is  a  good  indication  for  system  scalabil¬ 
ity  with  simple  optics.  The  VCSEL  arrays  used  in  our 
prototype  were  8x8  arrays  operating  at  843nm,  with 
lOpLin  apertures  on  250//m  centers.  Outputs  were  fo¬ 
cused  by  refractive  microlens  arrays  (/  =  600^m)  to  a 
waist  nominally  14mm  from  the  microlens  array.  Metal 
micromirror  arrays  were  fabricated  and  assembled  with 
prisms  into  beam-combiner  cubes  under  a  microscope. 
14mm  diffractive  microlens  arrays  were  added  to  the 
appropriate  cubes  for  in-column  relaying.  Inter-column 
relaying  was  done  with  commercial  laser  diode  40mm 
doublets.  Each  board  had  one  4x4  multimode  fiber  ar¬ 
ray  for  collection  of  the  beams  destined  for  the  16  PEs 
at  that  board.  For  more  information  see  [8]. 

4  Experimental  Results 

Each  fiber  array  was  made  by  inserting  stripped, 
clad  fibers  through  125//m  holes  on  500^m  centers,  in 
thin  BeCu  plates  made  by  chemical  etching.  Two  plates 
were  separated  by  a  2mm  distance  for  proper  parallel 
fiber  angles.  The  assembly  was  made  rigid  using  epoxy 
cement,  and  fiber  ends  polished  using  a  polishing  ma¬ 
chine.  Finally,  an  /  =:  600/im  microlens  array  was  posi¬ 
tioned  and  optically  cemented  into  place  over  the  fiber 
ends  for  coupling  of  the  free-space  beams  into  the  ar¬ 
rays.  Fiber  center  accuracies  of  i5^m  were  achieved. 

The  optomechanical  assembly  is  a  semi-kinematic 
magnetic  slot-rail  system.  The  goal  was  to  avoid  me¬ 
chanical  translation  stages.  Accurate  pre- assembly  was 
needed  for  the  VCSEL/collimation  assemblies  and  mi¬ 
cromirror/microlens  cube  assemblies.  Final  alignment 
was  made  by  rotation  and  separation  of  pairs  of  risley 
prisms  in  the  slots. 

On  each  VCSEL  array  (i.e.,  cluster  board),  a  maxi- 


(a)  50Mb/s  received  data  patterns  (b)  1 .6Gb/s  received  eye  diagram. 

Figure  4:  Low  and  high  data  rates  channel  results, 
mum  of  12  lasers  could  operate  in  parallel  at  a  rela¬ 
tively  low  data  rate  (50Mb/s).  These  low  rate  chan¬ 
nels  were  driven  by  a  logic  analyzer  (HP  16500B  with 
16520A  (PG  master)  and  two  16521A  (PG  slaves)).  A 
high  bandwidth  laser  (out  of  8  possible  per  board), 
was  driven  by  a  3 Gb/s  BERT  (HP  71600B).  This  al- 
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lows  US  to  check  parallel  operation  (and  crosstalk)  and 
high  bandwidth  operation.  Each  fiber  output  was  con¬ 
nected  to  a  commercially  available  lOOMb/s  fiber  re¬ 
ceiver  modules.  The  output  of  this  module  was  con¬ 
nected  to  an  input  channel  of  the  logic  analyzer.  This 
was  done  for  all  the  low  bandwidth  channels.  For  the 
high  bandwidth  channel,  the  fiber  output  was  ted  to 
a  high  bandwidth  receiver  connected  to  the  bit-error 
tester.  The  power  budget  was  as  follows:  VCSELs 
output  power  was  in  the  range  of  600  —  900/iW  with 
100%  modulation  for  most  devices.  The  range  of  over¬ 
all  estimated  transmissions  is  0.054  to  0.211.  Lowest 
power  at  the  receiver  was  27//W  and  highest  power  was 

190//W.  Figure  4(a)  shows  the  results  of  the  low  band¬ 
width  channels  as  captured  at  the  various  receivers  by 
the  logic  analyzer.  Figure  4(b)  shows  the  1.6  Gb/s  eye 
diagram  of  the  BERT  driven  fast  channel.  All  tested 
fast  channels  operated  with  an  error  rate  of  <  10”^^, 
and  most  operated  up  to  1.6  Gb/s,  limited  by  driver  cir¬ 
cuitry  design  and  impedance  mismatching  of  the  500 
signal  line  to  the  VCSEL. 

The  system’s  operation  was  limited  by  the  variation 
in  the  characteristics  of  the  VCSELs,  even  within  one 
array.  Only  45  out  of  the  55  possible  50Mb/s  rate 
lasers,  operated  successfully.  The  remaining  10  either 
did  not  lase  due  to  defects,  or  had  very  high  threshold 
currents  and  could  not  operate  simultaneously  with  the 
other  lasers  in  the  array  (since  they  all  have  a  common 
bias  current).  The  working  channels  were  driven  with 
pseudo  random  bit  patterns.  For  simplicity,  both  the 
VCSEL  driver  and  receiver  circuitry  were  AC  coupled. 
To  maintain  zero  DC  level  accumulation,  each  set  of  4 
bits  was  sent  twice,  once  as  is  and  once  inverted. 

Another  problem  was  the  instability  of  the  polar¬ 
ization  of  the  VCSEL  outputs.  This  caused  up  to  a 
2:1  time- dependent  variation  in  detected  signal  pow¬ 
ers.  Some  improvements  were  possible  with  careful 
bias-point  selection,  and  the  use  of  quarter-wave  plates 
at  the  VCSELs  to  help  Tatten’  the  variation  for  the 
most  severe  cases.  Polarization  control  is  an  impor¬ 
tant  improvement  required  for  future  VCSEL  arrays. 
This  will  allow  the  use  of  simple  micromirror  structures 
which  are  all  sensitive  to  polarization  whether  they  are 
of  metal,  dielectric  or  holographic  types. 

Figure  5  shows  the  experimental  setup  used  to  test 
our  network  prototype.  Figure  6  presents  a  close-up 
view  of  the  64-port  network  prototype. 

5  Conclusions  and  Future  Work 

We  presented  the  experimental  results  of  a  64  port 
optical  network  prototype.  We  reviewed  the  design  and 
principles  of  operation  for  larger  networks  with  thou¬ 
sands  of  channels.  Our  target  is  to  realize  large  net¬ 
works  by  exploiting  the  connectivity  and  bandwidth 
optics  has  to  offer.  Our  approach  avoids  many  of  the 
problems  found  in  electronic  networks  (blocking,  la¬ 
tency  delays  due  to  routing  and  queuing  in  multiple 
stages,  complexity  of  operation).  Optics  is  used  to 
form  high  bandwidth  point-to-point  connections  that 
are  reconfigurable.  These  are  complemented  by  fast 
routing,  small  electronic  switches.  The  overall  result¬ 
ing  network  appears  to  parallel  applications  as  if  it  has 
a  high  connectivity,  high  bandwidth  and  low  routing 
latency,  most  of  the  time.  The  interconnection  cache 
and  switching  locality  principles  of  operation,  for  par¬ 
allel  processing  applications,  make  our  network  an  at- 


Figure  6:  The  64-ports  network  prototype, 
tractive  alternative,  both  in  performance  and  possible 
lower  cost,  for  future  MPP  systems. 

We  plan  to  build  a  larger  prototype  having  256  to 
512  ports,  all  of  them  to  operate  at  >  1  Gb/s  rates.  For 
the  current  prototype,  we  used  external  instruments 
to  drive  the  network.  For  the  future  optical  network, 
we  hope  to  have  an  MPP  system  with  “real”  proces¬ 
sors.  This  MPP  system  will  have  interconnection  cache 
switches  and  will  run  real  parallel  applications.  Our 
goal  is  to  connect  the  larger  optical  network  prototype 
to  this  MPP  system  and  examine  in  real  operation  the 
benefits  of  our  approach. 
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Abstract:  Intelligent  Optical  Backplanes  can  enhance  computing  and  communications  architectures 
by  simultaneously  transporting  and  processing  digital  data  at  terabit  aggregate  rates,  thereby 
enabling  new  paradigms.  Prospects  for  Intelligent  Optical  Backplanes  and  their  smart  pixel  arrays 
will  be  described. 


Summary:  A  Intelligent  Optical  Backplane  Fiber 
consists  of  a  number  of  Processing  Boards  Inputs^ 
(i.e..  Printed  Circuit  Boards,  MultiChip 
Modules  or  combinations)  interconnected  by  a  N, 

number  of  parallel  optical  channels  (typically  pcBt\j 
10,000)  as  shown  in  Figure  1.  To  access  these  \  ^ 
optical  channels  each  processing  board  \  p| 
contains  one  or  more  smart  pixel  arrays.  The 
smart  pixel  arrays  provide  the  potential  /  Kj 
capability  of  transporting  and  simultaneously  X  fc 
processing  terabits  of  data  per  second,  a  S 

capability  which  is  unrivalled  with  other  I  ^ 
electronic  or  photonic  technologies.  The  Smart 
concept  of  an  Intelligent  Optical  Backplane 
will  exploit  these  unique  capabilities  to 
enhance  the  computing  and  communications  architectures 
of  the  future. 


Switching  Node 

ATM  Line  Card 
I — ^  Interface  ICs 


^  Message 
Processor  ICs 


Optomechanical 
Support  Structure 


Arrays 


Parallel  Optical  Channels 


Figure  1:  A  photonic  backplane. 


Conventional  architectures  can  be  termed  "connection  constrained"  and  are  limited  by  the 
communication  and  processing  bandwidth  available.  The  world's  most  powerful  supercomputers 
currently  have  bisection  bandwidths  in  the  tens  of  Gigabits/sec.  (The  "bisection  bandwidth"  can  be 
defined  as  the  bandwidth  which  crosses  a  bisector  which  splits  the  architecture  into  two  halves  of 
equal  size).  Todays  connection-constrained  supercomputers  occupy  multiple  cabinets  of  electronics 
interconnected  with  wires.  Over  the  last  few  decades  advances  in  technology  have  continuously 
impacted  systems  architectures  by  reducing  size  and  increasing  performance  by  roughly  a  factor  of 
two  every  year.  If  the  trend  is  to  continue  the  supercomputer  of  today  will  fit  within  a  backplane 
rack  within  a  decade  and  will  offer  bisection  bandwidths  in  the  terabit/second  range. 

Using  the  silicon-SEED  technology  [3]  a  1  cm  die  of  silicon  has  the  potential  of  1,000  -  2,000 
optical  I/O,  an  equal  or  smaller  amount  of  electronic  I/O,  with  clock  rates  in  the  hundreds  of 
megabits/sec  per  I/O.  Each  smart  pixel  array  may  simultaneously  process  and  transport  hundreds  of 
gigabits  of  optical  data  per  second,  and  a  processing  board  with  ten  smart  pixel  arrays  may  process 
up  to  a  few  terabits  of  optical  data  per  second.  The  unique  ability  to  transport  and  process  vast 
amounts  of  data  per  second  will  impact  future  architectures  by  enabling  new  paradigms  for 
computing  and  communications.  Potential  applications  include  terabit  point-to-point  and  multi¬ 
point  photonic  ATM  switching  architectures,  terabit  shared  memory  and  message-passing  parallel 
computing  architectures,  terabit  dataflow  computing  architectures,  terabit  "Intelligent  Memory 
Systems",  and  terabit  parallel  database  architectures. 

Smart  Pixel  Arrays:  A  connectivity  model  for  a  photonic  backplane  architecture  is  illustrated  in 
Figure  2.  Each  PCB  contains  multiple  smart  pixel  arrays  which  manage  access  to  the  optical 
channels  in  the  free-space  photonic  backplane.  The  smart  pixel  arrays  can  be  organized  into  a  2D 
"communication  slice",  where  each  slice  interfaces  a  set  of  E  electrical  channels  with  a  set  of  O 
optical  channels,  where  E<0  typically,  as  shown  in  Figure  3.  (Within  a  slice  each  channel  is  w 

^This  research  was  sunnorted  bv  NSERC  Canada  Grant  OGP0121601. 
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bits  wide  for  A  slice  can 

inject  data  from  the  set  of  E 
electronic  channels  onto  a  subset  - 
of  cy  <  O  of  optical  channels,  and  j;! 
can  extract  data  from  a  subset  of  ;;;  ■  ^ 

O  <0  optical  channels  to  the 
electrical  channels.  In  an  iji 
Intelligent  Optical  Backplane  a  ;;; 
slice  may  process  optical  data  as  it 
passes  by  to  determine  which  data 

it  wishes  to  extract,  and  by  .  .. 

enhancing  the  processing  ^  ^ 
capabilities  new  paradigms  for  jj^pW 
computing  and  communications  k,  j 
can  be  explored.  Some  potential 
applications  are  outlined.  Node^^ 

Photonic  Switching:  Smart  pixel  Figui 
arrays  which  detect  equivalence 
between  two  binary  patterns  (i.e., 
addresses  and  destinations)  can  be  used  to 
implement  point-to-point  photonic 
switching.  Each  smart  pixel  array  is 
assigned  a  unique  binary  address  from  an 
associated  message-processor  (MP).  Each 
data  packet  has  a  header  containing  the 
destination  address.  Smart  pixel  arrays  are 
constantly  comparing  the  packet  destination 
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Figure  3:  Smart  pixel  array. 


Figure  2:  Connectivity  model. 
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Figure  4:  Logic  for  Multipoint  Photonic  Switching. 


address  with  their  unique  addresses,  and 

change  their  state  to  extract  the  packet  when  a  match  is  detected  [4].  The  processing  requires  an 
EXOR  and  OR  gate  per  pixel.  To  enable  multipoint  photonic  switching  the  packet  header  consists  of 
2  fields,  the  mask  and  destination  fields,  where  a  0  in  a  mask  bit  implies  a  logical  don  t  care  for  that 
bit  position.  This  functionality  enables  multipoint 
switching  to  a  wide  range  of  selected  subsets.  The 

processing  requires  =  m  EXOR,OR  znd  AND  r  u  *  t  d  ..a  -  a/t  m 

gate  per  pair  of  pixels  (assuming  the  mask  and  L  bit  Lower  Bound  M^_i---  q 

destination  appear  on  separate  pixels).  L  -  bit  Upper  Bound  -  A^_i . 

Range  Inclusion:  Smart  pixel  arrays  which  detect  ^  -  bit  Destination  =  di^_x . d^ 

inclusion  within  a  range,  where  the  ranges  are  Extract  =  ( Dest>  AL)  •  ( Dest<  2V) 

integers  or  floating  point  numbers,  can  be  used  m  , 

an  intelligent  backplane  which  performs  Extract  =  (Dest  >  Af)  +{Dest<Ai) 

distributed  sorting  efficiently.  (It  has  been  Ad\  FYtrnrt-ir>P<!t<U\ 

estimated  that  significant  fraction  of  the  world's  Extract  [Dest-Afj,  [  ) 

computing  power  is  spent  sorting.)  The  packet  ^  x  •  r  r,  t  i  • 

header  consists  of  one  field  denoting  an  integer  or  Figure  5:  Logic  for  Range  Inclusion/Exclusion. 

floating  point  number  called  the  "key".  Each  , 

smart  pixel  array  is  supplied  with  two  bounds  from  the  MP .  The  conditions  for  extraction  may  be 
inclusion  or  exclusion  of  the  key  within  the  range  or  whether  the  key  is  lower  than  or  greater  than 
the  bounds,  as  shown.  While  not  shown  here  the  processing  requires  =  12  binary  gates  per  pixel. 


Parallel  Prefix:  A  "parallel  prefix"  operation  over  N  processors  is  defined  as  follows.  Let  each 
processor  i  have  a  key  k. .  After  the  parallel  prefix  each  processor  i  contains  addition 

can  be  replaced  with  any  associative  operator).  To  implement  the  parallel  prefix  each  sm^ 
array  i  operates  on  the  keys  broadcasted  by  processors  0..i  and  reports  the  result  to  the  MP. 
Alternatively  each  array  may  operate  upon  its  own  key  and  an  incoming  running  sum,  and  report 
the  result  to  the  MP  and  simultaneously  forward  it  to  the  next  processor.  Parallel  prefix 
computations  occur  frequently  and  are  often  "hard-wired"  into  parallel  computing  machines  to 
execute  faster.  Hence  the  implementation  of  the  parallel  prefix  directly  by  the  smart  pixel  array  will 
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enhance  photonic  computing  architectures  of  the  future.  Smart  pixel  arrays  which  detect  the 
maximum  (or  minimum)  key  from  all  keys  in  packet  headers  will  also  prove  equally  useful. 


Pattern  Matching:  Functional  memory  systems  such  as  the  Content  Addressable  Memorys  {CAM) 
allow  the  pre-processing  of  data  before  it  is  extracted  from  a  dense  VLSI  memory  [5].  Smart  pixel 
arrays  which  perform  pattern  matching  over  terabits  of  data  may  enable  new  models  for  distributed 
data  caches,  content-addressable-memories,  data-flow  architectures  and  parallel  database  systems. 
The  VLSI  CAM  memory  provides 
storage  and  retrieval  with  limited  I/O 
bandwidth  and  with  dense  processing 
capabilities  (perhaps  many  thousand 
of  comparisons  within  a  single  CAM 
IC).  Smart  pixel  arrays  generally 

provide  a  very  large  I/O  and  r  ,  .  /  m 

processing  bandwidth  with  generally  Extract  =  U/  j  j  H —  +  •  [a!o  i  © 

fewer  comparisons  occurring  within 

the  IC.  Hence,  the  smart  pixel  arrays  ^  t  ■  r  n  ,  .  ,  • 

may  find  applications  as 


L-  bit  Mask: 


-■  ^z-i 


L  -  bit  Pattern^  = 

Z  -  bit  Key  =  •  • 


gateways"  which  perform  transportation,  processing  and  selection  of  search  keys  at  terabit 
aggregate  rates,  leading  to  further  processing  on  the  processing  boards. 


Let  each  smart  pixel  array  store  i  patterns  and  each  packet  header  contain  one  or  more  search  keys. 
The  arrays  performs  bit-wise  comparisons  with  the  search  keys  and  the  patterns  in  parallel  and 
matching  keys  are  extracted.  (The  comparison  may  span  multiple  clock  cycles  to  allow  for  long 
search  keys.)  The  previous  functionality  can  be  enhanced  by  associating  a  bit-mask  for  each  search 
key,  where  the  comparators  examine  only  the  bit  positions  specified  by  a  non-zero  mask  bit  as 
before.  The  processing  requires  =  between  6  and  6i  gates  per  pixel  depending  on  the  slice  design. 

The  pattern  matching  concept  can  be  extended  by  computing  the  Hamming  Distances  between  the 
search  keys  and  patterns  (according  to  the  bits  specified  in  a  mask  field)  and  extracting  the  data  if  a 
threshold  is  exceeded.  Keys  which  match  in  b  or  more  bits  meet  the  threshold  criterion  and  are 
extracted  for  further  off-chip  processing.  One  may  envision  a  terabit  content  addressable  memory 
where  the  strict  match  criterion  of  conventional  CAMs  is  replaced  by  an  exact  or  near  match  based 
upon  Hamming  distance.  The  photonic  backplane  may  find  applications  in  parallel  database 
systems  and  fuzzy  logic  inference 
systems. 


Summary:  This  paper  has  outlined 
potential  capabilities  and  applications 
of  Intelligent  Photonic  Backplanes. 
While  the  field  is  relatively  new  the 
prospects  appear  promising. 
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where  Ind{ v,B)-\  if  Hamming _ Distance (p,0)>b 

Figure  7:  Logic  for  Hamming  Distance  pattern  matching. 
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1  Introduction 

What  is  wrong  about  Optical  Computing  is  the  im¬ 
plied  search  for  “general  purpose  computing”.  We 
think  that  such  an  attempt  has  little  chance  to  result 
in  a  practical  system  for,  at  least,  the  next  ten  years. 
The  main  reason  is  the  economical  justification.  What 
such  an  “optical  computing”  system  may  offer  has  to 
be  compared  with  the  value  of  the  application  and  the 
alternatives  (electronics).  On  the  other  hand,  commu¬ 
nication  in  general  is  an  area  where  optics  has  proved 
to  be  a  real  blessing.  Long  distance  comrnunication 
is  most  economically  done  today  using  optical  fibps. 
We  think  that  another  realistic  search  for  good  optical 
applications  should  now  be  done  for  shorter  distances. 
A  possible  good  direction  may  be  the  communication 
needs  of  Massively  Parallel  Processing  (MPP)  systems. 
In  such  a  system,  large  number  (lO’s  of  thousands)  of 
Processing  Elements  (PEs)  are  to  be  interconnected. 
A  PE  can  be  seen  as  made  of  a  high-end  single  chip 
CPU  available  today,  with  memory  and  communica¬ 
tion  circuits.  We  do  not  view  the  other  possible  mean¬ 
ing  of  MPP,  namely  processing  and  interconnections  at 
the  single  gate  or  device  level,  as  practical  to  consider. 
This  paper  describes  the  views  of  the  author  from  the 
computer  architecture’s  standpoint,  with  the  hope  to 
serve  as  a  pointer  to  the  “Optical  Computing”  com¬ 
munity.  Although  much  has  been  done  in  the  area  of 
optical  communication  technology  for  the  past  10-20 
years,  and  many  optical  network  experimental  systems 
have  been  proposed,  it  seems  that  optics  has  not  yet 
found  its  expected  place  as  the  interconnection  tech¬ 
nology  of  choice  for  MPP  systems.  In  this  paper  we  try 
to  suggest  some  possible  reasons  preventing  the  com¬ 
mon  use  of  optical  interconnections  in  MPP  systems, 
in  a  hope  to  focus  attention  on  what  really  needs  to 
be  done  to  advance  the  field.  We  would  suggest  focus¬ 
ing  on  searching  for  a  processing-less  solution  rather 
than  trying  to  mimic  the  existing  thinking  of  electronic 
networks.  We  outline  several  key  principles  essential 
to  follow  to  reach  realistic  and  economical  solutions  of 
optical  interconnections  for  MPP  systems.  An  example 
of  using  such  principles  for  an  MPP,  free-space  network 
is  presented  in  [1]. 

2  Optics  and  Economics 

Economics,  as  in  everything  else,  plays  an  important 
role  when  considering  the  use  of  optical  communica¬ 
tion  and  computing.  The  economics  problem  can  also 
be  seen,  not  only  as  a  practical  issue,  but  as  having  a 
technical  aspect  as  well.  In  the  making  of  a  computer 
system,  an  architect  has  often  a  need  to  compromise, 
to  balance  between  opposite  trends  and  competing  sit¬ 
uations.  One  such  balancing  point  can  be  the  choice  of 
alternative  ways  to  accomplish  a  function.  The  obvious 
reason  for  optics  not  to  have  a  major  role  in  MPP  sys¬ 
tems  might  be  the  cost.  An  architect  may  not  care  so 
much  about  the  physics  behind  the  optical  devices  (sim¬ 
ilar  to  not  caring  much  about  the  quantum  mechanics 
theory  that  is  behind  the  operation  of  the  electronic 
components).  But  he  worries  about  the  cost  and  ma¬ 
turity  of  the  technology. 


To  understand  the  enormous  task  facing  optical 
technology,  we  would  like  to  briefly  review  the  fore¬ 
casted  advance  in  VLSI  technology,  from  the  economic 
point  of  view  and  addressing  their  functional  relation¬ 
ship  to  optics.  A  recent  special  report  in  the  July 
4,  1994  issue  of  Business  Week  [2]  reviewed  the  size 
and  speed  prediction  for  the  VLSI  technology  develop¬ 
ment  in  the  near  future.  The  economic  issue  is  directly 
related  to  this  technological  development.  Today,  a 
$4,000  PC,  based  on  an  Intel’s  Pentium  microproces¬ 
sor,  has  the  power  of  a  1988  top  of  the  line  Cray  Y-MP 
supercomputer.  It  is  predicted  that  because  of  the  re¬ 
duction  in  the  transistor  size,  and  the  improvement  in 
its  speed,  by  the  year  2011  one  DRAM  chip  will  have 
64  Gbit  capacity,  and  the  microprocessor  clock’s  speed 
will  reach  800  MHz.  The  article  cites  this  formidable 
progress  in  VLSI  technology  as  the  “miracle  of  eco¬ 
nomics”,  providing  almost  “free”  computing  power. 

In  contrast  to  these  predictions,  it  seems  optical 
“computing”  is  well  behind.  Indeed  there  are  exam¬ 
ples  of  maHng  logical  gates  and  integrated  optics  that 
may  be  pointed  out  as  possible  candidates  for  future 
use.  However,  there  are  few  points  to  make,  even  if 
the  technology  progresses  to  a  point  similar  to  current 
development  in  VLSI  circuits: 

Higher  processing  power:  This  is  one  justification 
for  possible  use  of  optics.  Then  the  question  is  “where 
will  you  use  such  power?”.  Well,  it  is  obvious  you  do 
not  need  this  extra  processing  power  in  all  cases.  Many 
applications  that  are  very  common  may  not  require 
much  processing.  The  microcomputer  in  the  washing 
machine,  in  the  coffee-maker,  in  the  car’s  engine,  and 
even  in  the  cheap  home  PC,  can  do  perfectly  well  with 
available  processing  power,  that  is  already  offered  by 

electronic  VLSI  circuits.  .  .  o  Txr  n 

What  about  supercomputing  applications:  Well, 

supercomputers  are  hardly  a  large  market  today.  It  is 
difficult  to  see  that  optical  computing  has  merit  for 
sustaining  current  progress  in  supercomputers.  With 
many  super  computing  companies  going  out  of  business, 
this  by  itself  may  not  justify  a  similar  investment  in 
technology  needed  to  advance  optics  to  today’s  VLSI 
technology  level.  If  the  target  of  optical  computing  is 
optical  supercomputing,  chances  are  not  so  bright  for 
optics  either. 

Special  purpose  (analog)  computing?  Indeed  for 
such  an  application,  optics  may  very  well  have  supe¬ 
riority  over  electronics.  However  it  maybe  that  any 
development  in  such  special  purpose  optical  comput¬ 
ing,  may  not  help  much  to  advance  the  practicality 
and  economical  use  of  general-purpose  optical  comput¬ 
ing  technology. 

The  purpose  of  all  of  the  above  was  mainly  to  argue 
that,  at  least  for  the  making  of  optical  interconnections, 
computing  should  be  avoided  as  much  as  possible.  The 
question  is  then,  how  to  make  a  network  that  does  not 
need  optical  computation  or  processing? 

3  Optical  Interconnections 

In  [3],  Goodman  et  al  suggest  the  use  of  optical  in¬ 
terconnections  in  a  VLSI  chip.  Sources  and  detectors 
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are  to  be  integrated  on  the  same  chip  as  other  elec¬ 
tronic  VLSI  circuits,  and  free-space  communication  us¬ 
ing  a  holographic  routing  element,  is  suggested.  This 
is  an  example  of  a  too  fine-grain  communication  that, 
at  least  for  now  (i.e.,  the  next  10  years)  is  not  prac¬ 
tically  and  economically  possible.  The  reasons  may 
be  because  of  the  difficulty  of  integrating  sources  and 
detectors  (chip  area  needed  for  operations),  packaging 
issues  (the  need  to  place  a  hologram  in  a  precise  po¬ 
sition  above  a  chip),  and  the  relative  cost  of  making 
a  high-bandwidth  optical  communication  link  vs.  the 
computation  performed  by  the  circuit. 

So  our  suggested  target  is  to  interconnect  at  the  chip 
level  (that  may  represent  in  the  near  future  quite  pow¬ 
erful  processing  elements)  and  above.  Since  we  are  also 
targeted  for  tens  of  thousands  of  such  PEs,  we  need  to 
look  for  an  appropriate  network  using  optical  technol¬ 
ogy,  not  only  for  feasibility,  but  also  as  a  real  alternative 
to  electronic  networks.  We  now  proceed  to  present  a 
list  of  topics  related  to  such  a  network,  having  certain 
general  principles  we  think  are  important  for  the  suc¬ 
cessful  use  of  optical  technology; 

Processing-less  operation:  As  we  have  explained, 
optics  does  not  offer  a  good  match  for  processing  as 
possibly  needed  in  an  interconnection  network  for  an 
MPP  system.  Moreover,  we  think  it  may  not  be  so 
good  to  do  too  much  processing  even  if  optics  could  do 
it  economically.  The  reason  is  that  excessive  process¬ 
ing  may  result  in  higher  time  delays  in  the  network. 
An  example  of  a  network  with  a  too  complex  process¬ 
ing  function  is  the  the  NYU  Ultracomputer  by  Gottlieb 
et  al  [4].  This  parallel  architecture  suggested  the  use 
of  a  multi-stage  interconnection  network  (Omega  net¬ 
work).  Each  stage  in  the  network  had  to  perform  quite 
complex  processing,  needed  for  combining  messages  (as 
an  idea  of  avoiding  blocking  in  certain  conditions)  and 
a  Tetch-and-add”  co-ordination  implemented  also  by 
the  network.  Such  an  approach  did  not  work  very  well 
even  with  electronic  VLSI  technology.  Therefore  we 
suggest  not  requiring  any  processing  at  ail,  if  possible, 
for  the  optical  implementation. 

Network  topology:  There  are  many  possible  ways  to 
make  networks.  A  survey  of  many  such  ways  was  pre¬ 
sented  by  T.  Feng  in  [5].  One  of  the  goals  when  looking 
at  all  these  various  network  topologies  is  to  understand 
some  of  the  motives  leading  to  their  suggestion.  One 
such  motive  is  the  limitation  of  electronic  technology 
(circuits,  packaging)  to  implement  large  switching  el¬ 
ements.  For  example,  it  is  hard  to  make  a  1000  by 
1000  crossbar  switch  as  one  VLSI  chip  and  package. 
However,  such  limitations  may  not  be  as  severe  for  op¬ 
tical  technology.  Unfortunately  this  point  tends  not  to 
be  remembered  by  many  researchers  in  optical  inter¬ 
connections.  Using  optics  for  interconnection  networks 
should  be  more  than  just  mimicking  existing  electronic 
networks.  For  an  MPP  interconnection  network,  the 
prime  advantages  of  optics,  namely  connectivity  and 
bandwidth,  must  be  fully  incorporated.  If  all  optics 
does  is  to  replace  a  wire  in  a  multistage  interconnec¬ 
tion  topology  with  a  fiber,  then  such  an  approach  will 
still  suffer  from  the  limitations  of  the  topology  and  the 
electronic  parts  without  adding  much  benefits  by  the 
use  of  optics. 

Number  of  stages:  An  ideal  would  be  to  build  a  net¬ 
work  with  only  one  stage,  or  one  big  crossbar.  Unfortu¬ 


nately  it  seems  that  even  with  the  added  connectivity 
optics  offers,  it  is  hard  to  make  a  crossbar  of  10,  000 
by  10,000  ports.  However,  a  smaller  size  of  1,000  by 
1,000  might  be  possible  to  make.  Since  we  limit  our 
goal  to  MPP  systems  of  lO’s  of  thousands  of  PEs,  such 
a  switch  may  be  all  we  need.  Then  the  goal  will  be  to 
use  minimum  number  of  stages  (e.g.,  2-3)  using  such 
switches,  to  built  the  required  size  network.  One  such 
network  structure  (with  other  nice  properties)  is  a  Clos 
network  [6]. 

Routing  and  control:  The  interconnection  network’s 
job  is  to  carry  information  from  one  port  to  another. 
In  doing  this  function,  various  conditions  may  occur  for 
which  the  network  has  to  offer  an  operational  solution. 
One  of  the  basic  problems  is  how  to  avoid  a  case  where 
information  originating  from  two  different  sources  is 
targeted  to  the  same  destination.  The  network  must 
control  the  flow  of  information  and  arbitrate,  or  decide 
between  multiple  sources,  which  one  is  to  arrive  at  any 
given  time  exclusively  at  a  target  port.  Another  basic 
operation  a  network  has  to  do  is  the  routing:  the  steer¬ 
ing  of  information  throughout  the  network  structure, 
from  the  source  to  the  destination.  Of  course,  for  a  sim¬ 
ple  crossbar,  such  routing  is  the  connection  between  an 
input  and  an  output.  However,  when  multiple  switch¬ 
ing  elements  (crossbars)  are  passed  by  the  information 
packet,  from  the  source  to  the  destination,  the  issue  of 
routing  becomes  more  complex.  We  maintain  our  prin¬ 
ciple  of  asking  not  to  have  to  do  any  processing  in  our 
optical  network.  To  do  so,  in  this  case,  means  that  no 
routing  or  arbitration  is  to  be  done  on  the  information 
as  it  flows  from  the  source  to  destination.  To  do  this, 
we  need  to  operate  the  optical  part  of  the  network  in  a 
circuit  switching  or  reconfiguration  mode.  This  mode, 
compared  to  what  is  known  as  packet  switching,  means 
that  we  set  the  optical  channels  to  form  point-to-point 
communications.  In  such  a  case,  no  arbitration  or  rout¬ 
ing  is  needed  in  the  optical  part  of  the  network.  Of 
course,  we  may  have  to  change  the  connections  from 
time  to  time  (i.e.,  change  the  point-to-point  connec¬ 
tions  between  the  ports  of  the  network).  However,  we 
would  like  to  make  such  changes  infrequently,  and  when 
we  make  them,  electronic  circuits  will  perform  the  pro¬ 
cessing  or  logic  functions  needed  for  proper  operation. 
The  problem  with  this  suggestion  is  that  a  straightfor¬ 
ward  approach,  where  the  network  operates  in  a  recon- 
figurable  mode,  has  very  little  usefulness.  Usually  the 
PEs  in  an  MPP  system  may  need  to  communicate  with 
different  PEs.  A  PE  may  change  quite  often  the  tar¬ 
get  PE  to  which  it  sends  some  information.  A  simple 
reconfiguration  mode  of  operation  may  not  be  enough. 
Fortunately,  we  can  take  advantage  of  some  observed 
behavior  of  parallel  applications.  Such  a  network  is  sug¬ 
gested  in  [1]  and  is  a  hybrid  of  a  reconfigurable,  optical 
network  layer,  with  small  electronic  crossbar  switches. 
Physical  implementation  and  scale-ability:  An 
important  issue  of  course  is  the  way  to  make  the 
network  given  the  wealth  of  optical  technology.  As 
switching  elements,  many  use  devices  such  as  LCLV, 
Accousto-optical  beam  deflectors,  arrays  of  VCSELs, 
etc.  The  layout  of  the  network  in  3-D  space  is  impor¬ 
tant  as  well.  It  may  be  useful  to  avoid  centralization 
of  devices  (such  as  the  use  of  a  large  LCLV  array  in 
a  matrix- vector  type  of  network).  If  a  1,000  by  1,000 
crossbar  switch  can  be  made  out  of  multiple  1  to  1, 000 
selectors  and  1,000  to  1  concentrators,  it  may  be  eas- 
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ier  to  make  than  using  a  single  pray  with  1,000,000 
switch  able  devices.  There  are  quite  a  few  cases  of  sys¬ 
tem  demonstrations  of  2  by  2  optical  crossbars  that  will 
not  scale  any  larger,  interesting  size.  Thus  a  proposal 
that  works  with  small  number  of  channels  must  be  care¬ 
fully  evaluated  for  its  feasibility  to  remain  attractive  tor 
larger  size.  Another  aspect  of  this  is  the  requirements 
imposed  on  the  individual  devices.  For  example,  if  the 
network  uses  VCSEL  arrays  or  ''smart  pixels”  that  are 
all  required  to  function  at  the  same  time,  issues  such  as 
the  total  power  dissipation  and  possible  electrical  and 
optical  crosstalk  preclude  a  larger  version  of  the  design. 
If  a  VCSEL  array  requires  individual  electrical  driving 
of  each  VCSEL  (or  even  an  X-Y  matrix  type  of  driving 
the  VCSELs),  then  the  limitations  on  the  I/O  pins  of 
the  electronic  package  may  prevent  the  realization  of 
a  larger  network.  Although  optical  technology  is  good 
at  passing  information  in  a  2-D  pattern,  if  the  pro¬ 
cessing  is  done  with  electronic  PEs,  a  conversion  will 
be  needed.  Then  it  has  to  be  seen  if  the  connectivity 
bottleneck  is  still  in  the  system,  although  it  may  have 
moved  to  another  location. 

4  System  Issues 

The  interconnection  network  is  not  a  stand-alone 
component  of  the  MPP  system.  It  is  only  one  part 
in  a  complete  system.  Although  it  is  viewed  as  one  of 
the  critical  components  that  may  have  severe  impact  on 
the  overall  system  performance,  it  is  always  important 
to  remember  that  it  is  the  overall  system  performance 
that  is  important,  not  only  the  network  performance. 
Evaluating  an  MPP  design  for  good  balance  between 
the  various  components,  can  also  aid  our  understanding 
of  the  needs  the  network  must  meet.  This  may  lead  to 
simplifying  the  network  functions  and  adapting  them 
to  best  fit  the  applications  and  the  optical  technology 

at  hand.  ,  ,  •  ,  •  j 

Consider  for  example  the  previous  topic  on  the  need 

to  have  a  reconfigurable  mode  of  operating  the  optical 
network.  As  described  in  [1],  there  are  many  applica¬ 
tions  that  exhibit  what  we  call  switching  locality.  This 
property  is  a  function  of  the  application.  But  knowing 
this,  we  may  simplify  the  requirements  for  the  network. 

Another  example  of  the  system’s  impact  can  be  seen 
by  the  use  of  interconnection  cache  switches  [1].  As 
previously  stated,  the  processing-less  optical  network, 
working  in  a  reconfigurable  mode,  may  not  match  di¬ 
rectly  the  communication  requirements  of  the  PEs  in 
the  MPP  system.  Connecting  a  small,  electronic  switch 
between  the  previously  set  point-to-point  optical  chan¬ 
nels  and  the  PEs  -  a  combination  of  packet-switching 
and  circuit-switching  -  may  result  in  better  fitting  the 
communication  needs  of  applications  exhibiting  switch¬ 
ing  locality  in  their  communication  patterns. 

Software  is  a  very  important  part  of  the  MPP  system 
as  well.  Part  of  the  task  of  making  the  system  work 
depends  on  various  software  components  such  as  the 
operating  system,  compilers,  debuggers,  etc.  Mapping 
and  embedding  are  the  phases  in  which  a  parallel  appli¬ 
cation  is  decomposed  into  concurrent  coniponents  that 
communicate  among  themselves  in  a  specific  communi¬ 
cation  graph.  This  graph  is  mapped  or  ernbedded  into 
the  network  structure  such  that  communication  needs 
are  satisfied  by  once  reconfiguring  the  optical  network 
and  then  using  the  interconnection  cache  switches  for 
on-line  routing  and  arbitration  of  the  communication 
messages  in  the  network. 

A  typical  parallel  application  may  be  quite  complex. 


It  is  usually  composed  of  several  phases.  Within  each 
phase,  a  different  communication  pattern  may  need  to 
be  supported.  Switching  between  phases  while  mini¬ 
mizing  data  movement  in  the  MPP  system  is  impor¬ 
tant  in  avoiding  the  waste  of  network  bandwidth.  An 
extension  of  the  interconnection  cache  principle  may 
provide  a  possible  solution  to  this  problem  by  chang¬ 
ing  the  relative  proximity  of  PEs  in  terms  of  level  of 
routing  or  arbitration  each  is  away  from  the  other  PEs 

with  which  it  communicates. 

Finally,  since  an  MPP  system  may  be  too  expensive 
to  be  always  committed  to  a  single  user,  it  is  impor¬ 
tant  to  have  a  good  support  in  the  network  structure 
for  the  partitioning  of  a  big  system  into  sub-systems. 
Such  partitioning  should  allow  each  sub-partition  to 
operate  in  an  independent  way,  without  degradation  of 
performance  because  of  any  type  of  program  running 
on  another  sub-partition.  The  network  presented  in  [1] 
has  such  a  property. 

5  Conclusions 

In  this  paper  we  have  raised  various  issues  we  think 
to  be  important  to  consider  for  making  an  optical  inter¬ 
connection  network  for  MPP  systems.  We  think  gen¬ 
eral  purpose  processing  using  optics  should  be  avoided 
for  a  while.  Thus  we  suggest  a  processing-less  oper¬ 
ation  style  for  an  optical  network.  Such  a  direction 
implies  that  the  network  will  have  to  be  reconfigured 
to  form  point-to-point  connections.  Such  a  reconfigura¬ 
tion  should  be  controlled  by  electronic  processing  and 
should  not  be  done  too  often.  Since  most  of  parallel 
processing  applications  cannot  directly  limit  their  coni- 
munication  to  only  be  directed  towards  a  single  desti¬ 
nation,  it  is  needed  to  complement  the  optical  network 
with  small,  electronic  switches  -  interconnection  caches. 
These  switches  alternate  between  the  optical  point-to- 
point  connections  and  thus  better  fit  many  parallel  pro¬ 
cessing  applications  that  exhibit  the  property  of  switch¬ 
ing  locality  in  their  communication  patterns.  Finally 
we  suggest  considering  the  operation  of  the  system  as 
a  whole  rather  than  looking  at  the  network  in  isola¬ 
tion.  Other  parts  of  a  system  may  influence  some  of 
the  considerations  made  in  deciding  the  network  prop¬ 
erties  and  functional  needs. 
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1.  Introduction 


This  paper  presents  an  extremely  high  density  and  lightweight  image  processing  system  being 
designed  at  Georgia  Tech.  The  system  consists  of  two  layers.  The  upper  layer  includes  a  focal 
plane  array  of  thin-film  GaAs  detectors  that  are  integrated  directly  on  top  of  an  array  of  Si-based 
SIMD  processors.  The  lower  layer  consists  of  an  array  of  more  powerful  MIMD  processors 
connected  in  a  wormhole  routed  two  dimensional  mesh.  The  two  layers  are  interconnected  via 
though-wafer  optical  interconnects  using  integrated  InGaAsP  devices. 


2.  Focal  Plane  Detector  Array 


The  focal  plane  processing  approach  to  optical  interconnect  dispenses  with  the  need  to  electrically 
convey  input  matrices  to  integrated  processing  circuitry  by  incorporating  photosensitive  devices 
on  the  same  substrate  as  the  processing  circuitry.  The  photodetectors  then  perform  as  I/O 
channels.  Optical  interconnect  technology  is  ideal  for  image  processing  tasks,  since  it  can  be  used 
for  sampling  incident  images  in  real  time  and  eompletely  in  parallel. 


Using  epitaxial  hftoff  technology  (ELO)  refined  here  at  Georgia  Tech  [2],  focal  plane  processors 
can  utilize  direct  connections  between  the  photodetector  and  processing  circuitry  layers.  This 
allows  for  a  high  fill  factor  without  detrimentally  affecting  the  area  available  for  signal  processing 
circuitry  or  inefficient  detection  of  radiant  energy  in  the  incident  signals.  It  also  allows 
independent  optimization  of  the  circuit  and  photosensitive  devices  through  separate  growth 
processes. 


Figure  1:  A  SIMD  Processor  with  Interface  Circuitry 
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3.  Processor  Architecture 

The  upper  layer  of  the  system  contains  an  array  of  SIMD  pixel  processors.  Each  node  includes  an 
8  bit  datapath  with  an  arithmetical,  logical,  shift  unit,  and  a  16  bit  multiply-accumulator  (MACC) 
used  in  many  early  image  processing  algorithms.  These  functional  units  access  an  eight  word 
register  file.  Each  node  has  64  words  of  local  memory.  (Up  to  256  words  can  be  addressed  in  the 
instruction  set.)  These  nodes  communicate  through  a  wire-based  nearest  neighbor  network  using 
special  registers  in  the  datapath.  The  lower  layer  contains  an  array  of  more  powerful  MIMD 
processing  nodes  designed  to  efficiently  support  high  throughput  parallel  applications.  These 
nodes  include  a  larger,  32  bit  datapath  and  more  local  memory  containing  4096  36-bit  words.  In 
addition,  these  nodes  are  connected  by  a  more  powerful  interconnection  network  which  support 
non-local  communication  between  nodes.  Figure  2  and  Figure  3  illustrate  the  microarchitecture  of 
the  two  processing  nodes. 


Figure  2:  SIMD  Pixel  Processor  Figure  3:  Pica  Microarchitecture 


4.  Through- Wafer  Optoelectronic  Interconnect 

The  upper  and  lower  layers  are  interconnected  via  through-wafer  optical  links.  Thin  film  InGaAsP 
emitters,  which  operate  at  a  wavelength  to  which  silicon  is  transparent,  are  integrated  (emitting 
down)  underneath  the  GaAs  detectors  of  the  top  layer.  Thin  film  InGaAsP  detectors  (receiving 
up)  are  integrated  onto  the  lower  level  of  MIMD  processors.  By  stacking  and  aligning  the  two 
layers,  parallel  unidirectional  optical  links  are  formed  between  the  two  layers.  By  integrating 
sixteen  devices  on  each  chip  operating  at  100  Mbps,  adequate  throughput  is  provided  between  the 
two  layers. 

5.  Status 

One  proposed  system  would  contain  a  32  by  32  array  of  upper  layer  chips.  Each  chip  would 
contain  an  eight  by  eight  array  of  detectors,  and  a  three  by  three  array  of  SIMD  processors.  This 
layer  would  provide  a  256  by  256  detector  focal  plane  array  with  14,000  MIPS  of  processing 
throughput.  The  lower  level  would  contain  a  16  by  16  array  of  MIMD  processors  (on  64  chips) 
providing  12,800  MIPS  of  more  general  purpose  processing.  The  optical  inter-layer 
communication  bandwidth  is  102,400  Mbps  between  layers. 

Integration  of  an  eight  b>  eight  array  of  thin  film  devices  has  been  demonstrated  and  is  shown  in 
Figure  4.  A  prototype  chip,  shown  in  Figure  5,  containing  digital  processing  circuitry,  analog 
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interface  circuitry,  and  integrated  InGaAsP  detectors  and  emitters  used  for  through  wafer 
communications  has  been  fabricated  and  integrated  and  is  currently  being  tested.  A  100  Mbps 
receiver  amplifier  has  been  fabricated  and  tested  and  is  described  in  [6].  Simulators  for  both  type 
of  processors  and  the  network  have  been  constructed.  Algorithms  for  compensating  for  detector 
array  non-uniformities  (frequency  domain  interpolation)  are  being  developed  for  the  SIMD  layer. 
Object  detection  algorithms  are  being  developed  for  the  lower  level. 


Figure  4:  8  x  8  Array  of  devices  integrated  on  Si  interface  circuitry  (shown  on  torch  of  dime) 


Figure  5:  Test  chip  combining  digital  and  interface  circuitry  and  integrated  InGaAsP  detector  and 

emitter 
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Introduction 

The  primary  requirements  for  a  tightly  coupled  multi¬ 
processor  interconnection  network  are  high  bandwidth, 
low  latency,  and  a  high  degree  of  connectivity  among 
the  processors.  Multiple  passive  star  networks  are 
attractive  for  multiprocessor  interconnection  networks 
because  they  offer  maximum  connectivity  with  a  con¬ 
stant  optical  power  budget,  and  are  simple,  relatively 
low  cost  yet  robust  structures  [BLM93].  Using  multiple 
passive  stars,  it  is  possible  to  design  completely  recon- 
figurable  networks  without  the  use  of  active  photonic 
switches  [GLZ91]. 

In  this  paper,  we  present  an  architecture  which  fulfills 
the  bandwidth,  latency,  and  connectivity  network 
requirements.  The  topology  is  based  on  a  multiple  pas¬ 
sive  star  organization  which  we  call  a  Partitioned  Opti¬ 
cal  Passive  Star  (POPS)  network  [CLM'^94].  In  this 
network,  a  passive  optical  fabric  is  used  to  implement  a 
reconfigurable  optical  interconnection  network.  All 
switching  is  performed  by  the  nodes  in  the  electronic 
domain  and  control  of  the  switching  is  based  on  a  state 
sequence  routing  paradigm  [CLMQ  93]. 

POPS  Network  Topology 

As  shown  in  Figure  1,  a  POPS  network  consists  of  a  col¬ 
lection  of  nodes,  shown  as  ellipses,  connected  by  pas¬ 
sive  star  couplers,  shown  as  rectangles  in  the  figure.  All 
nodes  include  a  transmitter  and  a  receiver  section.  The 
transmit  and  receive  sections  for  each  node  are  shown 
separately  on  the  left  and  right  sides  of  the  figure.  Using 
multiple  passive  star  couplers  the  nodes  are  partitioned 
into  groups  such  that  each  group  shares  common  inputs 
or  common  outputs  from  among  a  set  of  couplers.  Each 
node  has  the  same  number  of  input  and  output  lines 
(referred  to  as  channels)  as  there  are  groups. 

POPS  networks  are  distinguished  from  other  types  of 
multiple  passive  star  topologies  in  that  all  couplers  have 
symmetric  and  equal  fanin  and  fanout.  Also,  the  nodes 
are  completely  connected  with  couplers  using  parallel 
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channels  without  hierarchical  interconnections.  Thus,  a 
path  exists  between  every  pair  of  nodes  and  each  path 
traverses  exactly  one  coupler. 

All  POPS  networks  are  characterized  by  the  parameter 
triple  (n,  d,  r).  The  first  parameter,  n,  is  the  number  of 
nodes.  The  second  parameter,  d,  is  the  partition  size. 
This  parameter  sets  the  size  of  each  group  and  the  fanin/ 
fanout  of  the  couplers.  The  third  parameter,  r,  character¬ 
izes  the  redundancy  of  the  network.  We  define,  g^n/d, 
which  represents  the  number  of  groups  into  which  the 
nodes  have  been  partitioned.  Each  of  the  couplers  in 
Figure  1  is  identified  by  a  double  (ij),  where,  i  is  the 
group  number  of  the  nodes  which  share  the  input  side  of 
the  coupler  and  j  is  the  group  number  of  the  nodes 
which  share  the  output  side  of  the  coupler.  A  POPS  net¬ 
work  is  constructed  by  appropriately  connecting  cou¬ 
plers  for  all  possible  values  (0  <  i<  g,0<j<  g).  Each 
node  is  connected  to  the  inputs  of  g  couplers  and  is 
capable  of  independently  transmitting  a  message  into 
any  one.  Similarly,  each  node  has  g  receivers  connected 
respectively  to  the  output  side  of  g  couplers  and  may 
independently  receive  a  message  from  any  of  the  cou¬ 
plers  on  its  receive  side.  Switching  and  configuration  of 
the  network  is  accomplished  by  selecting  the  appropri¬ 
ate  output  and  input  channels  at  each  node.  It  is  neces¬ 
sary  to  provide  a  control  mechanism  to  enable  the 
transmitters  and  receivers  to  execute  the  selection  oper¬ 
ations.  This  control  mechanism  is  performed  using  state 
sequence  control. 
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Distributed  State  Sequence  Control 

The  goal  of  state  sequence  control  is  to  decouple  the 
network  throughput  from  the  bandwidth  of  the  elec¬ 
tronic  control  system  which  routes  each  message.  By 
exploiting  locality  in  the  message  traffic,  the  latency  of 
each  control  operation  is  amortized  over  a  several  mes¬ 
sage  transfers.  In  other  words,  there  exists  a  sequence  of 
selection  operations  of  length  k  which  contains  all  paths 
in  the  current  traffic.  The  corresponding  sequence  of 
network  states  is  referred  to  as  the  stale  sequence.  The 
transformation  of  the  state  sequence  is  accomplished  by 
monitoring  the  status  of  the  nodes  for  the  occurrence  of 
a  sequence  fault.  When  a  sequence  fault  occurs,  the 
sequence  transformer  must  modify  the  state  sequence  to 
include  the  requested  path. 

In  our  implementation,  both  the  state  sequence  genera¬ 
tion  and  state  sequence  transformation  functions  are  dis¬ 
tributed  to  designated  nodes  within  the  network.  Each 
group  of  nodes  in  the  POPS  topology  has  within  it  a  des¬ 
ignated  control  node  which  has  the  additional  responsi¬ 
bility  of  implementing  the  control  functions  for  that 
group.  The  state  sequence  generation  function  is  parti¬ 
tioned  such  that  the  control  node  for  each  group  gener¬ 
ates  the  state  word  corresponding  to  paths  which 
originate  in  that  group.  Similarly,  state  sequence  trans¬ 
formation  is  partitioned  such  that  the  control  node  in 
each  group  services  sequence  faults  for  paths  originat¬ 
ing  within  that  group. 

Each  of  the  control  nodes  examine  the  status  words  gen¬ 
erated  within  its  own  group  for  any  field  which  indicates 
a  sequence  fault.  If  a  fault  is  detected  by  the  control 
node,  the  sequence  fault  service  algorithm  selects  a 
location  in  the  state  sequence  into  which  the  faulted  path 
can  be  placed.  Thus,  the  control  unit  in  response  to  a 
sequence  fault  transforms  the  state  sequence  by  over¬ 
writing  an  existing  path  with  the  faulted  path. 


Implementation 

Figure  2  is  a  block  diagram  showing  the  constituent 
parts  of  a  node.  Each  node  consists  of  a  host  processor,  a 
bus  interface  residing  on  the  host  bus,  memory  commu¬ 
nicating  directly  with  the  bus  interface  forming  part  of 
the  shared  network  memory,  a  node  bus,  a  set  of  chan¬ 
nels  and,  in  the  case  of  a  control  node,  the  control  node 
logic.  The  node  represented  in  the  figure  is  assumed  to 
be  any  member  of  the  *:-th  group.  The  pair  (ij)  shown  at 
the  output  and  input  of  each  channel  denotes  the  cou¬ 
plers  to  which  the  channel  is  connected. 

Data  transfers  originating  with  the  host  processor  enter 
the  network  via  the  bus  interface.  The  bus  interface  per¬ 
forms  translation  between  the  physical  address  space  of 


Figure  2.  Node  Architecture 
the  host  system  and  the  network  address  space.  The  net¬ 
work  address  space  consists  of  node  memory  units  resi¬ 
dent  on  each  node  which  collectively  form  the  network 
shared  memory.  The  bus  interface  also  provides  buffer¬ 
ing  and  supports  asynchronous  transfers  over  the  node 
bus  of  data  moving  between  individual  channel  control¬ 
lers  and  the  host  processor. 


Network  operations  are  treated  as  extensions  to  the  bus 
cycle  of  the  host  processor  which  originates  the  data 
transfer.  Specifically,  the  bus  interface  supports  the  fol¬ 
lowing  network  operations:  read,  write  and  atomic  read/ 
modify/write.  When  the  host  processor  bus  initiates  one 
of  these  operations  to  a  network  address,  the  host  pro¬ 
cessor  bus  cycle  is  extended  until  the  completion  of  the 
required  network  operations.  At  the  destination  node, 
the  bus  interface  allows  data  to  be  transferred  to  and 
from  node  memory  without  intervening  operations  on 
the  local  host  bus. 


Each  network  transfer  occurs  via  one  of  the  channel 
controllers  attached  to  the  node  bus.  Each  channel  con¬ 
troller  consists  of  a  transmitter,  receiver  and  control 
logic  to  implement  the  necessary  control  operations.  A 
node  having  the  additional  responsibility  of  being  a  con¬ 
trol  node  contains  control  logic  which  performs  the  gen¬ 
eration  and  transformation  of  the  state  sequence  and 
communicates  with  the  network  by  means  of  the  appro¬ 
priate  channel.  The  complete  design  can  be  found  in 
|Tez94]. 


Simulation  Results 

An  event  driven  simulator  was  constructed  to  assess  the 
dynamic  performance  of  the  network  under  conditions 
of  differing  message  traffic  load,  sequence  length  and 
network  size.  The  simulator  operates  using  a  timebase 
that  consists  of  two  time  units  per  step  in  the  state 
sequence.  The  propagation  delay  of  a  message  within 
the  network  and  is  defined  to  be  2  steps,  one  for  control 
and  one  for  data.  Sequence  faults  are  serviced  by  the 
control  mechanism  in  the  order  of  their  arrival,  in  paral¬ 
lel  for  each  channel.  Insertion  of  the  requested  path  into 
the  state  sequence  is  performed  by  an  LRU  approxima¬ 
tion  algorithm. 
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Figure  3  shows  a  plot  of  average  latency  of  a  (512,64,1) 
network  for  sequence  lengths  of  4  to  48.  Latency  is  the 
time  for  a  message  to  be  transferred  from  the  buffer  of 
the  originating  node  to  the  buffer  of  the  receiving  node. 
The  solid  line  represents  the  latency  values  for  a  traffic 
load  of  20%  and  the  dotted  line  for  a  load  of  15%.  It  can 
be  observed  there  is  an  optimum  sequence  length  in 
which  latency  is  minimized  for  both  load  values. 


Sequence  Length 


Figure  3.  Latency  vs.  Sequence  Length 

Figure  4  shows  a  plot  of  percentage  of  faults  with 
respect  to  messages  generated  for  differing  sequence 
length  within  the  same  range  as  in  figure  3.  The  curves 
represent  15%  and  20%  traffic  loads  as  above.  It  can 
seen  that  the  percentage  of  faults  decrease  with  increas¬ 
ing  sequence  lengths  reaching  a  minimum  value  for 
sequence  lengths  of  10  to  24  for  loads  of  15%  and  20% 
respectively. 
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Sequence  Length 

Figure  4.  Fault  Percentage 

Figure  5  shows  the  average  latency  with  respect  to  net¬ 
work  size.  The  network  group  size  was  scaled  propor¬ 
tionally  to  the  square  root  of  the  size  of  the  network. 
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Figure  5.  Latency  vs.  Network  Size 
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1.  Introduction 

Two-dimensional  (2D)  arrays  of  microlasers  are  manufactured  in  two  primary  configurations:  individually 
addressable  [1],  and  matrix  addressable  [2],  as  illustrated  in  Figure  1.  Each  microlaser  in  the  individually 
addressable  array  has  a  ground  (n)  terminal  and  a  positive  (p)  terminal.  All  of  the  microlasers  share  the  same 
ground,  but  a  separate  p  contact  is  provided  for  each  microlaser.  An  8x8  array  thus  requires  64  p  contacts,  as 
indicated  by  the  numbered  bonding  pads  at  the  edges  of  the  array.  For  small  arrays  individual  addressing 
works  well,  but  the  complexity  becomes  unmanageable  as  the  arrays  scale  to  large  sizes,  and  so  an  alterna¬ 
tive  configuration  is  needed  that  scales  more  gracefully. 

For  the  matrix  addressable  array,  each  row  of  microlasers  shares  the  same  ground.  For  the  8  rows  shown  in 
Figure  lb,  there  are  8  independent  n  lines  which  are  each  connected  to  a  distinct  bonding  pad.  The  p  lines  are 
connected  to  the  columns  in  a  similar  manner,  and  so  there  are  8  independent  p  lines,  which  connect  the  p 
contacts  of  the  8  microlasers  in  a  column.  In  order  to  enable  a  microlaser  at  location  (/, ;),  in  which  i 
identifies  a  row  and  j  identifies  a  column,  the  corresponding  i  row  and  j  column  bonding  pads  must  be 
enabled.  The  n  ground  is  applied  to  the  row  pad  and  the  p  voltage  is  applied  to  the  column  pad.  If  a  voltage 
is  applied  to  more  than  one  pad,  then  the  corresponding  collection  of  microlasers  is  enabled.  In  Figure  lb,  a 
potential  is  applied  across  rows  2, 3,  and  5  and  columns  3, 4,  and  7,  which  enables  the  nine  microlasers  at  the 
corresponding  crosspoints.  Notice  that  only  six  bonding  pads  are  used,  as  opposed  to  the  nine  bonding  pads 
for  the  same  individually  addressable  configuration  shown  in  Figure  la. 

An  advantage  of  the  matrix  addressable  configuration  is  that  for  an  increase  in  the  size  of  an  array,  the 
bonding  pad  complexity  increases  by  only  2N,  which  allows  for  a  simplified  electronic  interface.  A  disad- 
Vcmtage  is  that  the  user  loses  a  degree  of  freedom  in  selecting  combinations  of  logic  gates  to  enable  or 
disable.  For  example,  in  Figure  lb,  there  is  no  combination  of  enabled  rows  and  columns  that  will  generate 
a  checkerboard  pattern.  Despite  the  limited  number  of  possible  on/off  combinations  for  a  matrix  addressable 
array,  the  complexity  of  the  electronic  addressing  is  simplified,  which  is  an  important  practical  consider¬ 
ation. 


Bonding  pads 


Figure  1:  (a)  Individually  addressable  microlaser  array;  (b)  matrix  addressable  microlaser  array. 
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Here,  we  describe  an  algorithm  for  decomposing  arbitrary  patterns  for  a  2D  microlaser  array  into  a  set  of 
subpattems,  that  when  applied  in  succession  achieve  the  desired  target  pattern.  We  begin  by  describing  a 
mathematical  model  for  the  problem.  We  then  develop  an  algorithm  for  the  optimal  decomposition  of  binary 
matrices  into  matrix  addressable  submatrices.  Finally,  we  relate  the  decomposition  method  to  Quine- 
McCluskey  (tabular)  reduction  of  Boolean  functions  [3]. 

2.  Summary 

In  developing  a  mathematical  model  for  the  approach,  we  first  explore  the  structure  of  patterns.  Pattern  P  = 
(m.p  is  a  Boolean  matrix  in  which  all  points  satisfy  the  relationships: 

V(/j),(A:,/) 

THij  =1  =  1 

and  <=>  and 

fnk.i  =  1  rrikj  =  1 

Notice  that  a  pattern  may  have  intervening  points  between  the  comers  that  are  not  part  of  the  pattern,  that  a 
pattern  may  have  zero  extent  in  any  dimension,  and  that  a  pattern  may  have  any  number  of  points  as  long  as 
the  above  relationships  are  satisfied. 

The  above  definition  of  pattern,  although  exact,  lacks  insight  into  the  inherent  structure  of  patterns.  This 
motivates  us  to  recast  the  problem.  We  begin  by  making  an  important  observation:  All  nonzero  row-vectors 
of  a  pattern  are  the  same,  and  all  nonzero  column-vectors  of  a  pattern  are  the  same.  The  outer  product,  <2^^, 
of  two  Boolean  vectors,  =  (r)-  and  =  (c),-  is  xC. 

Now  we  investigate  the  interaction  between  patterns. 

Lemma  1:  Every  Boolean  matrix  B  can  be  expressed  as  a  Sum  (a  Boolean  ORing)  of  patterns.  This  decom¬ 
position  need  not  be  unique. 

Lemma  2:  The  outer  product  of  two  Boolean  vectors  is  always  a  pattern.  Furthermore,  every  nonzero 
pattern  can  be  expressed  uniquely  as  an  outer  product  of  two  Boolean  vectors. 


Definition:  Patterns  and  P^  are  said  to  be  mergeable  if  and  only  if  their  sum  P  =  P^+P^is  also  a  pattern. 
Moreover,  we  say  the  merge  is  qualified  (^-mergeable)  if  the  sum  P  is  not  equal  to  either  P^  or  P^. 

Theorem  (^-Merging  Rule) 

P^(=  rJ  X  Cj )  and  Pj  (=  pj  X  C2)  are  ^-mergeable  (i.e.  Pj  +  P^  is  also  a  pattern)  if  and  only  if  R^  =  R^  or 
Cj  =  Cj.  Furthermore,  the  structure  of  the  resultant  pattern  P  =  {P^  +  P^  =  R^xC)  can  be  defined  as  follows: 


Pj  -  P2 


P  =  Pj 
C  =  C,  +  C. 


Cl  —  Co 


C  =  Cl 

p  =  PJ  +  P2 


Now,  we  recast  the  problem  as  decomposing  a  0-1  matrix  B  into  a  logical  sum  of  the  minimum  number  of 
patterns.  We  describe  an  algorithm  (the  “a-Algorithm”)  that  creates  an  optimal  decomposition.  A  Boolean 
matrix  with  a  single  nonzero  entry  is  defined  as  a  unit  matrix.  Every  unit  matrix  is  a  pattern,  and  every 
Boolean  matrix  can  thus  be  expressed  as  the  sum  of  its  constituent  unit  patterns.  With  respect  to  the  Boolean 
matrix  B,  this  set  is  called  the  set  offiindamental  patterns.  The  a-Algorithm  begins  with  the  set  of  funda¬ 
mental  patterns  for  a  given  Boolean  matrix  B  =  (b.^.  By  repeated  application  of  the  ^-merging  rule,  the  first 
stage  of  the  a-Algorithm  constructs  a  set  of  prime  implicant  patterns,  in  which  no  two  patterns  are  q- 
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mergeable.  The  second  stage  of  the  a- Algorithm  operates  on  the  set  of  prime-implicant  patterns,  and  on  the 
set  of  fundamental  patterns  in  which  every  pattern  is  not  ^-mergeable  with  any  of  the  fundamental  patterns. 

nc- Algorithm 

Input:  The  Boolean  matrix  B  = 

Preprocessing  Step:  Construct  F,  the  set  of  fundamental  patterns  of  B. 

First  Stage 

Let  W  =  F.  Mark  each  element  of  W  uncovered. 

While  there  exists  two  ^-mergeable  patterns  P^,  in  W: 

Begin 

LetP  =  P^  +  P^. 

Mark  P^  and  covered. 

LetW=WuP. 

Mark  P  uncovered. 

End 

Let  /  be  the  set  of  all  uncovered  patterns  of  W. 

Second  Stage 

For  each  pattern  K.  e  I,  we  constract  a  set  H.,  the  set  of  all  fundamental  patterns  that  are  not  q- 
mergeable  with  ii:. Intuitively,  H.is  the  set  of  fundamental  patterns  that  are  “covered”  by  the  K.. 

Select  C  cl,  such  that  U  =  F  and  C  is  minimal. 

Output:  C  is  the  minimal  set  of  patterns  that  will  activate  the  binary  matrix  B. 

The  a- Algorithm  resembles  the  classic  Quine-McCluskey  (Q-M)  method  for  reduction  of  two-level  Bool¬ 
ean  expressions  [3].  The  major  difference  is  in  the  first  stage.  However,  any  heuristic  that  applies  to  the  first 
stage  of  the  Q-M  method  can  be  suitably  modified  to  apply  to  the  first  stage  of  the  a- Algorithm.  The  second 
stage  is  essentially  the  Set  Cover  Problem  [4],  to  which  there  are  many  heuristics  and  approximate  solutions. 

A  remaining  unexplored  problem  is  whether  a  simple  row-column  matrix  addressing  approach  leads  to  the 
minimal  overall  decomposition,  as  compared  to  the  same  number  of  bonding  pads  applied  to  a  different 
wiring  pattern. 
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Two  key  difficulties  in  the  implementation  and 
use  of  multistage  interconnection  networks  have  been 
the  complexity  of  the  network  hardware  and  the  com¬ 
plexity  of  the  routing  algorithm.  This  has  been  par¬ 
ticularly  evident  in  MIMD  computing  environments, 
when  the  network  needs  to  support  arbitrary  inter¬ 
connection  pattern  requests,  and  when  no  requests 
are  to  be  buffered  or  postponed.  Under  the  premise 
that  the  use  of  optics  can  at  least  partially  allevi¬ 
ate  the  network  hardware  complexity  issue,  we  con¬ 
sider  in  this  paper  the  routing  algorithm  complex¬ 
ity  needed  to  control  such  an  optical  network.  We 
present  a  routing  algorithm  designed  with  the  goal  of 
minimal  time  complexity. 

Extended  Generalized  Shuffle  (EGS)  networks 
[1,2]  provide  extended  capability  and  flexibility  over 
conventional  shuffle-exchange  networks  by  removing 
restrictions  on  the  specific  interconnection  patterns 
used  and  allowing  tradeoffs  between  network  width 
and  depth.  EGS  networks  are  also  particularly  suited 
to  optical  implementation,  due  to  the  many  optically 
implementable  shuffle- equivalent  topologies  [3]. 


ers,  ultimately  emerging  at  the  N  network  outlets. 
An  example  EGS  network  is  shown  in  Fig.  2.  This 
example  illustrates  a  special  case  of  regular  symmet¬ 
ric  networks  in  which  F  is  a  power  of  two  {F  =  2^). 
In  this  case,  the  Tan-in”  and  Tan-out”  stages  may 
each  be  implemented  by  /  stages  of  l-to-2  switches 
or  2-to-l  switches  as  shown. 


I  ^  2-shuffle  ^  1-to-F  demux  switch  i 

1  §  F-shuffle  ^  F-to-1  mux  j 


Figure  1:  Regular,  simplified  NxN  EGS  network. 


An  EGS  network  is  the  central  element  of 
the  Shared  Memory  Optical/Electronic  Computer 
(SMOEC)  [4].  The  SMOEC  is  a  fine-grained  par¬ 
allel  computer  architecture  which  consists  of  N  pro¬ 
cessing  elements  and  N  memory  modules  intercon¬ 
nected  by  a  passive  optical  implementation  of  an 
EGS  network  [4,  5].  This  network  is  the  Free-space 
Interconnection  with  Externally-controlled  Routing 
(FIER).  The  bidirectional  passive  2x2  switches 
within  the  FIER  implement  message  combining  in  the 
processor— ^memory  (forward)  direction  and  message 
broadcasting  in  the  memory-^processor  (reverse)  di¬ 
rection.  The  FIER  is  circuit-switched  by  an  external 
controller  (described  below).  Although  the  FIER  was 
designed  as  an  essential  component  of  the  SMOEC, 
it  is  a  self-contained  subsystem  which  may  be  used  in 
other  computation  or  communication  systems. 

In  this  paper,  a  regular  simplified  class  of  EGS 
networks  is  considered,  as  shown  in  Fig.  1.  In  this 
simplified  form,  each  stage  is  identical,  all  shuffles 
are  2-shuffles  (except  the  final  shuffle  before  the  final 
Tan-in”),  and  all  switches  are  2x2  switches.  Data 
from  A  =  2”  network  inlets  pass  through  a  “fan¬ 
out”  stage  which  is  a  set  of  1-to-F  demultiplexers, 
then  through  the  main  section  of  Ss  shuffle-exchange 
stages,  and  through  an  F-shuffle  followed  by  a  “fan- 
in”  stage  which  operates  as  a  set  of  F-to-1  multiplex¬ 


INLETS 


OUTLETS 


2-shuffle  2-shuffle  2-shuffle  4-shuffle 


^ 

I  l-to-2  switch  @  2-to-2  switch  ^  2-to-l  switch  i 
(demux)  (bypass/exchange)  (mux)  j 

_ _ _  _  _ _ _  _  _ _ 

Figure  2:  EGS  example  with  Ss=Sy  F=4=2^. 


Relations  between  the  EGS  parameters  n,  F5, 
and  F  for  nonblocking  network  operation  are  pro¬ 
vided  in  Table  1.  As  the  number  of  stages  is  increased 
from  1  to  2n— 3,  the  minimal  required  F  is  decreased. 
However,  raising  Ss  above  2n— 3  does  not  result  in  a 
further  decrease  in  the  minimal  required  F. 

The  special  value  Ss  =  2n-3,  with  the  mimimal 
F  =  n,  has  particular  significance  in  that  it  results 
in  the  minimum  device  cost  (number  of  switches)  for 
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Ss  Even 

Ss  Odd 

Ss<n 

F  >  (1.5  X  2^-1) 

F  >  2"-^s(2^^-l) 

Ss>n 

F  >  2"-^s(1.5x2^) 

4"  Ss  —  n  “  1 

c  ^  “i"  ^ 

F  >  2"-^s(2-^) 

+  Ss-n—1 

Table  1:  Nonblocking  conditions  for  regular  symmet¬ 
ric  NxN  EGS  networks  (from  [6]). 

that  particular  n  [6].  In  the  work  presented  here,  F 
is  restricted  to  be  a  power  of  two  (F  — 2-^)  for  ease  of 
implementation  with  l-to-2  and  2-to-l  switches  (as 
used  in  the  SMOEC).  For  this  special  case^  the  new 
minimal  device  cost  may  occur  at  different  values  F' 
and  Sg.  These  new  values  are  found  for  a  given  n 
by  rounding  F  -  n  up  to  the  next  higher  power  of  2 
(F'),  then  working  backwards  from  the  equations  in 
Table  1  to  find  the  new  3. 

A  nonblocking  EGS  network  has  a  multitude 
of  paths  available  between  each  inlet-outlet  pair. 
Richards’  path  hunt  algorithm  [1]  completes  a  sin¬ 
gle  routing  request  in  constant  time.  It  parallelizes 
the  routing  of  multiple  requests  by  piplining  the  re¬ 
quests  and  by  processing  a  small  constant  number  of 
these  pipelined  requests  in  parallel.  Thus  Richards’ 
algorithm  takes  0[N]  time  to  process  N  simultane¬ 
ous  routing  requests.  A  new  algorithm,  the  Flexible 
Localized  Algorithm  for  EGS-network  Management 
(FLAEM),  is  presented  here  which  processes  each  of 
N  routing  requests  in  parallel  for  a  regular  simpli¬ 
fied  combining  EGS  network.  This  algorithm  is  de¬ 
signed  to  control  a  circuit-switched  network  in  ap¬ 
proximately  C?[logA^]  time. 

The  FLAEM  routing  algorithm  can  be  imple¬ 
mented  in  parallel  using  a  separate  electronic  con¬ 
trol  unit  [4] ,  the  Multifunctional  Arbitrator  of  Traf¬ 
fic  for  Shuffle-exchange  Hardware  (MATSH).  The 
MATSH  is  a  single  bidirectional  electronic  shuffle- 
exchange  stage  with  feedback  connections.  Each  node 
in  the  MATSH  is  a  multifunctional  switch  containing 
an  enhanced  bypass/exchange  switch,  some  memory, 
and  some  elementary  logic  functions.  Nodes  in  the 
MATSH  are  capable  of  true  fan-out  (in  the  forward 
direction)  so  that  multiple  copies  of  routing  requests 
may  be  produced  and  used  to  try  multiple  paths  in 
parallel.  Once  the  full  set  of  switch  settings  is  cal¬ 
culated  by  the  FLAEM,  these  settings  are  sent  in 
parallel  to  circuit-switch  the  FIER  optical  network. 

To  facilitate  the  explanation  of  the  FLAEM  pro¬ 
cedure,  the  MATSH  control  unit  will  be  described  as 
if  it  were  a  full  St  =  Fs  -f  2  =  C?[log  N]  stage  shuffle- 
exchange  electronic  network,  in  which  processing  pro¬ 
gresses  (forward  or  reverse)  one  stage  at  a  time. 

In  the  SMOEC,  multiple  requests  that  are  des¬ 
tined  for  the  same  outlet  node  are  always  assumed 
to  be  combinable.  Thus  the  FIER  optical  network 
is  capable  of  implementing  many-to-one  connection 
patterns  in  the  forward  direction.  This  combinability 


is  an  essential  assumption  of  the  FLAEM  method. 
Thus,  the  FLAEM  is  applicable  only  to  EGS  net¬ 
works  that  can  work  with  such  a  strong  combinability 
assumption.  However,  a  trivial  case  of  this  assump¬ 
tion  (zero  requests  destined  for  the  same  outlet  node) 
yields  the  result  that  the  FLAEM  is  also  applicable 
to  EGS  networks  that  are  restricted  to  process  only 
permutation  (one-to-one)  connection  patterns. 

The  FLAEM  procedure  to  satisfy  N  routing  re¬ 
quests  is  discussed  in  three  parts:  (1)  A  forward  rout¬ 
ing  pass,  (2)  a  reverse  pass  to  communicate  back  to 
the  inlet  nodes  which  requests  made  it  through  to  the 
outlet  nodes,  and  (3)  a  forward  pass  to  fix  in  place 
selected  winning  returned  requests  and  reroute  un¬ 
successful  requests.  Parts  (2)  and  (3)  are  repeated  as 
a  unit  until  all  requests  are  satisfied.  Each  execution 
of  parts  (2)  and  (3)  together  is  counted  as  a  “try” . 
Each  of  these  three  FLAEM  procedure  parts  will  now 
be  explained  in  detail. 

Part  (1)  First,  the  MATSH  links  are  initial¬ 
ized  to  Free.  Then  F  copies  of  each  routing  request 
(marked  Run)  are  made  by  the  initial  stage  of  1-to-F 
fan-out  switches  in  the  MATSH  control  unit.  Each 
request  copy  is  assigned  a  random  unique  “priority” 
value  from  0  to  F-1.  Each  request  is  also  assigned 
a  randomly  selected  EGS  path  vector  [2],  which  spec¬ 
ifies  one  of  the  many  paths  available  from  the  inlet 
node  to  its  destination  outlet  node.  The  priorities, 
path  vectors,  and  request  state  (e.g.  Run)  are  stored 
in  the  memories  associated  with  the  links  as  the  re¬ 
quest  is  routed. 

Stage  by  stage,  the  request  copies  are  propa¬ 
gated  forward  when  possible.  The  two  possible  inputs 
to  each  link  in  the  MATSH  are  processed  sequentially 
in  random  order,  so  that  a  link  may  be  Free  or  occu¬ 
pied  on  this  first  pass.  Let  Ln  denote  the  link  in  the 
next  stage  that  was  specified  by  the  path  vector  of  a 
request  at  link  Lc  in  the  current  stage.  The  request 
at  Lc  may  be  either  simply  routed  forward,  combined 
with  another  request,  or  aborted  due  to  conflict  with 
another  request,  as  follows.  That  request  is  routed 
forward  if  the  link  Ln  is  marked  Free.  If  Ln  is  occu¬ 
pied  by  another  request  that  is  destined  for  the  same 
outlet  node,  request  at  Lc  may  be  combined  with  it. 
In  this  case,  the  link  Lc  is  marked  Combined  and 
not  individually  routed  further.  The  request  at  Ln  is 
given  the  maximum  of  the  two  priorities.  The  later 
return  pass  of  part  (2)  will  further  process  the  COM¬ 
BINED  request  as  approprate.  If  Ln  is  occupied  by 
another  request  with  an  incompatible  destination,  its 
priority  is  compared  the  priority  of  the  request  at  Lc, 
and  the  one  with  the  higher  priority  wins  while  the 
other  is  aborted  (marked  Conflict).  If  Lc  is  a  stage 
that  is  early  enough  in  the  MATSH  such  that  it  still 
has  more  than  one  path  available  to  its  destination 
outlet  node,  an  aborted  request  may  then  instead  try 
to  propagate  or  combine  with  the  other  available  link 
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that  it  can  reach  in  the  next  stage.  (It  is  also  possi¬ 
ble  to  make  extra  copies  of  requests  that  encounter 
extra  free  nodes  in  the  network,  although  this  was 
not  implemented  in  the  FLAEM  simulation  results 
presented  below.) 

Part  (2)  After  the  first  pass  through  the 
MATSH,  all  requests  that  made  it  to  the  final 
stage  are  marked  Thru.  A  reverse  pass  through 
the  MATSH  is  now  performed,  propagating  Thru- 
marked  requests  in  the  reverse  direction,  and  any 
aborted  Conflict  requests  (including  requests  that 
were  combined  with  aborted  requests)  are  erased. 

Part  (3)  After  this  reverse  pass,  each  inlet 
node  has  some  number  (possibly  zero)  of  Thru  re¬ 
quests.  A  single  winning  Thru  request  (the  lowest 
priority  request  was  selected  for  the  FLAEM  simula¬ 
tion)  for  each  inlet  node  is  marked  Fixed.  If  some 
inlet  nodes  had  zero  Thru  requests,  then  the  Run  re¬ 
quests  from  those  unsuccessful  inlet  nodes  are  again 
each  copied  F  times  using  the  fan-out  stage  and  as¬ 
signed  random  unique  priorities  and  individual  new 
path  vectors.  The  FIXED  requests  are  now  propa¬ 
gated  forward  through  the  MATSH  (changing  win¬ 
ning  Thru  requests  to  Fixed  at  each  stage),  while 
non- winning  Thru  requests  are  erased.  At  the  same 
time,  the  new  Run  requests  are  routed  as  previously 
described,  except  that  no  new  request  can  displace 
a  Fixed  request.  The  full  set  of  N  routing  requests 
are  considered  all  satisfied  after  each  inlet  node  has 
a  Fixed  (completed)  request. 

Results  from  FLAEM  simulation  are  presented 
in  Table  2.  Each  line  of  data  shows  the  EGS  pa¬ 
rameters  (the  F=2-^  special  case  described  above), 
the  number  of  randomly  assigned  unrestricted  pat¬ 
terns  (many-to-one  permitted)  or  permutations  (one- 
to-one)  routed,  the  number  of  tries  (listed  as  a  per¬ 
cent  of  the  number  of  patterns  simulated),  and  the 
average  number  of  tries  that  each  pattern  took.  These 
data  indate  that  for  5  <  n  <  12,  the  maximum  num¬ 
ber  of  tries  is  (almost  always)  2.  The  number  of  tries 
per  pattern  increases  very  slowly  as  n  increases  for  a 
given  F,  then  drops  again  as  F  is  increased.  Thus 
the  number  of  tries  necessary  for  a  complete  routing 
appears  empirically  to  be  a  small  near-constant  value 
(since  results  for  n^20  are  unlikely  to  be  of  practical 
interest). 

The  FLAEM  technique  works  as  well  as  it  does 
because  it  satisfies  most  of  its  N  routing  requests  on 
the  first  try,  leaving  only  a  tiny  number  of  requests 
for  rerouting.  The  second  try  is  even  more  successful 
(very  few  failures  are  noted  in  the  data)  since  there 
are  so  many  free  paths  in  the  network  available  to 
the  small  number  of  copies  of  the  rerouted  (initially 
unsuccessful)  requests.  Since  each  try  takes  2  passes 
through  the  MATSH  control  unit  (plus  an  extra  final 
pass),  and  a  MATSH  pass  takes  0[\ogN]  time,  the 


n 

N 

F  Ss 

No.  of 
Patt’s 

No.  of  Tries  (%) 

1  2  3 

No.  Tries 
per  Patt. 

5 

32 

8 

5 

10000 

84.38 

15.61 

0.01 

1.1563 

6 

64 

8 

7 

10000 

65.44 

34.56 

0 

1.3456 

7 

128 

8 

10 

1000 

33.9 

66.1 

0 

1.661 

8 

256 

8 

13 

1000 

3.2 

96.5 

0.3 

1.971 

9 

512 

16 

12 

1000 

70.8 

29.2 

0 

1.292 

10 

1024 

16 

14 

1000 

34.9 

65.1 

0 

1.651 

11 

2048 

16 

16 

1000 

3.6 

96.4 

0 

1.964 

12 

4096 

16 

19 

200 

0 

100 

0 

2.00 

No.  of 

No.  of  Tries  (%) 

No.  Tries 

n 

N 

F  Ss 

Perm’s 

1 

2 

3 

per  Perm. 

5 

32 

8 

5 

10000 

85.56 

14.43 

0.01 

1.1445 

6 

64 

8 

7 

10000 

69.65 

30.35 

0 

1.3035 

7 

128 

8 

10 

1000 

41.5 

58.5 

0 

1.585 

8 

256 

8 

13 

1000 

3.8 

96.2 

0 

1.962 

9 

512 

16 

12 

1000 

81.1 

18.9 

0 

1.189 

10 

1024 

16 

14 

1000 

46.7 

53.3 

0 

1.533 

Table  2:  FLAEM  simulation  results. 


apparent  small  constant  bound  to  the  routing  tries 
indicates  the  FLAEM  technique  completes  a  full  pat¬ 
tern  routing  in  approximately  [log  TV]  time. 

Essential  routing  and  control  aspects  of  inter¬ 
connection  network  design  are  often  not  explored 
in  sufficient  detail  to  ensure  that  these  aspects  will 
not  unduly  limit  system  performance.  Control  issues 
for  the  FIER  optical  network  are  presented  here  so 
that  they  can  be  solved  during  the  design  process. 
The  FLAEM  routing  algorithm  was  designed  to  con¬ 
trol  a  regular  simplified  combining  (or  permutation- 
restricted)  EGS  network  in  a  circuit-switched  man¬ 
ner  in  0[\ogN]  time.  This  algorithm  is  a  useful  new 
method  for  this  class  of  EGS  networks,  which  are  par¬ 
ticularly  suited  for  passive  optical  implementation. 

This  work  was  supported  in  part  by  AFOSR  (Grant 
No.  F49620-93-1-0437)  and  ARPA  (Grant  No.  MDA972-94- 
1-0001).  This  paper  was  presented  at  the  Optical  Society  of 
America  Topical  Meeting  on  Optical  Computing,  Salt  Lake 
City,  Utah,  March  1995. 
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1.  Introduction 

Optical  implementation  of  logical  operation  is  an  important  subject  in  digital  optical  computing. 
For  example,  optical  array  logic  (OAL)  symbolic  substitution  (SS)  ^  binary  image  algebra  (BIA)  ,3  and 
image  logic  algebra  (ILA)  4  are  typical  methods  for  implementing  optical  parallel  logic.  Recently,  optical 
implementation  of  fuzzy  logic  based  on  area-coding  technique  (ACT)  has  been  proposed.  These 
methods  treat  image  data  as  an  information  medium  for  parallel  processing.  Image  has  inherent  features 
which  arc  not  only  parallolism  for  procGSsing  but  also  visual  intorfac©  for  human.  In  short,  th©  availability  of 
visual  t©chniqu©s  loads  to  th©  dovolopmont  of  now  approach©©  that  ar©  inhorontly  visual. 

Th©  ACT  is  a  spatial  coding  tochniqu©  to  roprosont  gray  valu©  by  modulating  th©  aroa  of  ©ach 
pixol  in  proportion  to  th©  original  gray  valu©.  ®  Digital  halftoning  is  a  practical  method  of  rondoring  illusion 
of  continuous-ton©  images  on  binary  display  devices.  ®  In  th©  micro  font  method  (MFM),  which  is  one  of 
digital  halftoning,  gray  valu©  is  represented  by  modulating  th©  area  of  ©ach  pixel  in  proportion  to  th© 
original  gray  value.  ^  In  th©  sens©  of  optical  computing,  th©  MFM  is  considered  as  a  kind  of  th©  ACT. 
Therefore,  we  propose  the  visual  area  coding  technique  (VACT)  for  optical  computing  which  enables  us 
to  visualize  the  result  by  combination  of  the  MFM  and  the  technique  used  in  the  ACT. 

2.  Area  Coding  Technique  (ACT)  and  Micro  Font  Method  (MFM) 

In  the  ACT,  gray  value  is  represented  as  the  area  of  the  transparent  part  in  a  pixel  as  shown  in  Fig. 
1 .  ^  Processing  in  the  ACT  is  based  on  fuzzy  logic.  Fuzzy  logic  is  an  extension  of  set  theoretic  rnulti- 
valued  logic  in  which  the  truth  values  are  presented  by  linguistic  variables.  The  fundamental  operations 
in  fuzzy  logic  are  maximum,  minimum,  and  negation  for  fuzzy  sets  given  by  Eqs.  (1)“(3).  ® 

|Li(A)  V  |x(B)  =  max  [  |li(A),  |li(B)  ]  (1 ) 

p(A)  A  |a(B)  =  min  [  p(A),  p(B)  ]  (2) 

—I  |x(A)  =  1  —  |x(A)  (2) 

where  |li(A)  and  |i(B)  are  fuzzy  menbership  functions  representing  fuzzy  sets  ;  v,  a  and  -i  an  maximum, 
minimum,  and  negation  operators,  respectively.  The  |li(X)  is  a  function  whose  value  is  restricted  within 

the  closed  interval  [0,  1].  ^ 

As  methods  of  digital  halftoning,  pulse  amplitude  modulation,  pulse  surface  area  modulation, 

ordered  dither,  and  micro  font  method  (MFM)  have  been  proposed.^*  ^  Among  them,  the  MFM  is 
suitable  for  rendering  detail  of  picture.  The  MFM  is  a  technique  to  render  gray  value  by  converting  the 
value  into  a  specific  micro  font.  In  the  MFM  three  typical  sets  of  fonts  are  utilized  :  Bayer  type,  spiral  type, 
and  net  type.  The  Bayer  type  font  set  for  17  gray  tones  is  shown  in  Fig.  2. 


the  garay  value  :  X  ( O^X^I) 


- - d - ^ 
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Fig.  1  Area  coding  technique 
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Fig.  2  Bayer  type  micro  font  method  ( 17  gray  tones ) 
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Note  that  both  the  ACT  and  digital  halftoning  are  considered  as  good  examples  of  digitized 
analog  processing.  Therefore,  it  is  expected  that  a  new  digitized  analog  optical  computing  technique  can 
be  developed  by  using  their  concepts. 

3.  Visual  Area  Coding  Technique  (VACT) 

In  the  sense  of  optical  computing,  the  MFM  is  regarded  as  a  kind  of  the  ACT,  because  both  the 
MFM  and  the  ACT  represent  gray  value  by  area  of  individual  pixel  proportional  to  the  original  gray  value. 
We  propose  a  novel  coding  technique,  visual  area  coding  technique  (VACT),  for  optical  parallel 
implementation  of  maximum,  minimum,  and  negation  operations  whose  results  are  visualized  by 
combining  the  ACT  with  the  MFM. 

Figure  3  shows  the  processing  procedure  for  optical  parallel  implementation  of  maximum  and 
minimum  operations  with  the  VACT.  First,  a  given  continuous-tone  discrete  image  is  transformed  into  a 
halftoned  image.  An  individual  pixel  datum  is  coded  by  referring  a  threshold  matrix  array.  Second, 
conversion  of  between  bright  true  logic  and  dark  true  logic  is  executed  by  inverting  the  contrast  of  a 
halftoned  image  to  realize  both  maximum  and  minimum  operations.  Third,  a  discrete  correlation  is 
executed  between  the  halftoned  image  and  an  operation  kernel  pattern.  This  procedure  corresponds  to 
the  procedure  of  shadow  casting  in  the  ACT.  In  bright  true  logic,  the  result  of  maximum  operation  can  be 
obtained  just  after  discrete  correlation.  In  dark  true  logic,  however,  the  procedure  of  inversion  is  required 
besides  the  procedure  of  discrete  correlation.  In  addition,  another  set  of  procedures  is  required  for 
negation.  Namely,  the  distribution  of  the  coded  pattern  is  turned  upside  down  after  inverting  the  contrast 
of  the  coded  pattern. 


C  Correlation  ) 


(  Rearrangement ) 


Output  Image  of  negation 


Fig.  3  Processing  procedure  for  optical  parallel  implementation  of  fuzzy  logic  with  the  VACT. 

4.  Experimental  Verification 

To  verify  the  principle  of  the  VACT,  we  executed  an  optical  experiment  of  the  vector-matrix 
composite  operation  on  the  fuzzy  relation  matrix  W  and  Input  fuzzy  vector  A'  in  fuzzy  reasoning  given  by 
Eqs.  (4)  and  (5). 

R  =  W  X  A'  (4) 

r  ij  =  min  { ai',  wij }  ( 1  <  i  <  n,  1  <  i  <  m )  (5) 

Figure  4  shows  the  experimental  system.  The  expanded  code  pattern  of  A',  which  is  illuminated 
by  a  plane  wave,  is  superimposed  on  the  coded  pattern  of  W. 


Fig.  4  Experimental  system  for  vector-matrix  composite  operation 
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Figures  5  (a)  and  (c)  show  the  fuzzy  relation  matrix  W  and  the  coded  pattern  of  W,  respectively. 
Letters  'a'  to  'g'  represent  the  number  more  than  9  in  ascending  order.  Each  value  in  the  coded  pattern  is 
nomalized  by  th©  rnaximurn  valu©  of  th©  fuzzy  r©lation  rnatrix.  Figur©s  5  (b)  and  (d)  show  th©  input  fuzzy 
vector  A'  and  the  expanded  code  pattern  of  A',  respectively.  Figure  5  (e)  shows  the  result  of  the  vector- 
matrix  composite  operation  on  the  matrix  W  and  the  vector  A'.  It  is  verified  that  the  result  can  be  directly 
recognized  as  a  gray-tone  image. 
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Fig.  5  (a)  fuzzy  relation  matrix  W,  (b)  input  fuzzy  vector  A',  (c)  coded  pattern  of  W  and  (d)  expanded  code 
pattern  of  A',  and  (e)  experimental  result  of  the  vector-matrix  composite  operation  on  W  and  A'. 


5.  Conclusion 

In  this  paper  we  have  presented  a  novel  technique  called  visual  area  coding  technique  (VACT), 
for  optical  implementation  of  fuzzy  logic  with  capability  of  visualization  of  the  results.  This  technique 
applies  the  MFM  as  a  kind  of  the  ACT.  Huge  amounts  of  data  processing  in  fuzzy  logic  can  be  achived 
with  the  the  VACT.  Moreover,  real-time  visualization  of  processed  results  can  be  achieved  with  the 
VACT. 
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Robust  Light  Bullet  Dragging  Logic 
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The  existence  of  three-dimensional  optical 
solitons,  which  feature  simultaneous  radially- 
symmetric  2D  spatial  self-focusing  and  temporal 
pulse  compression,  has  recently  been  suggested^. 
Unlike  one-  or  two-dimensional  solitons,  light- 
bullets  are  completely  localized  and  are  confined 
purely  by  nonlinear  effects;  they  do  not  require 
any  static  dielectric  waveguide,  but  as  a  result 
can  not  take  advantage  of  the  interplay  of  di¬ 
electric  confinement  and  material  dispersion  to 
yield  a  region  of  anomalous  GVD.  AGVD  can 
be  created  with  linear  gratings^,  by  pumping  the 
media  to  invert  the  dispersion  relation^,  and  via 
parametric  gain^.  Like  lower-dimensional  soli¬ 
tons,  the  spatial  profile  of  a  light-buUet  is  created 
by  the  balance  of  Kerr  self- focusing  and  diffrac¬ 
tion,  while  its  temporal  pulse-shape  is  deter¬ 
mined  by  the  balance  of  Kerr  pulse-compression 
and  group-velocity  dispersion. 

These  light-bullets  are  unstable  in  a  Kerr  me¬ 
dia,  but  the  lowest-order  soliton  is  stable  to  prop¬ 
agation  (for  sufficient  pulse  energy)  in  materials 
with  physically  reasonable  saturating  or  nega¬ 
tive  nonlinearities.  (Higher-order  solitons 
require  orders-of-magnitude  more  energy  and  are 
likely  to  be  unstable  to  angular  perturbations^.) 
These  non-Kerr  media  are  described  by  the  non¬ 
linear  index  functions 

n  =  riQ -[•  n2\E\^  I  {1  \E^ l\Esat^) 

n  =  no  +  n2\E\‘^  -  n4\E\'^ 

for  saturating  and  714  nonlinearities  respectively. 
It  has  been  predicted^  that  3D  solitons  in  a 
saturating  Kerr  material  are  stable  to  propaga¬ 
tion  if  the  pulse  energy  is  above  some  value. 
This  energy  can  be  derived  from  the  scaling 
of  the  numerically-determined  fundamental  soli¬ 
ton  profiles.  Beam  propagation  confirms  these 
derivations,  in  particular  that  stable  light  bullets 
will  propagate  if  the  peak  soliton  electric  field  is 
greater  than  Egat  in  the  case  of  a  saturating  non¬ 
linearity  or  greater  than  \/An2/n4  in  the  case  of 
the  power-series  nonlinearity. 


Figure  1:  Spherical  BPM  simulation  of  self-focusing 
3D  Gaussian  pulses  with  n2  and  724  nonlinearities 
=  .5722/774).  The  two  initial  conditions  dif¬ 
fer  only  by  a  small  scale  factor,  demonstrating  the 
thresholding  inherent  to  soliton  propagation. 

These  stablized  soliton  waves  are  system  at¬ 
tractors:  arbitrary  pulses  not  too  far  from  the 
soliton  profile  will  form  into  solitons  (see  Figure 
1),  and  lower  dimensional  envelopes  will  break 
up  into  sets  of  higher  dimensional  solitons. 

These  stable  light-buUets  are  well-matched 
to  the  potential  application  of  all-optical,  digital 
computing.  Because  solitons  exhibit  a  critical 
threshold  energy  -  below  which  they  spread  and 
above  which  they  become  self-contained  -  they 
are  natural  carriers  of  binary  information.  Light- 
buUets  carrying  this  information  occupy  smaU 
volume  with  minimal  energy  because,  in  the 
paraxial  limit,  light-buUets  decrease  in  size  with 
decreasing  energy,  in  contrast  to  one-dimensional 
temporal  or  spatial  solitons  which  require  greater 
energy  to  create  a  smaUer  soUton.  Since  these 
small,  intense  Ught-buUets  require  no  static  di¬ 
electric  confinement,  a  single  volume  of  bulk 
nonlinear  material  can  support  light-buUet  logic 
gates  with  three  dimensions  of  paraUel  operation. 

To  be  appUcable  to  large  scale  digitial  logic, 
an  optical  gate  must  have  certain  properties  in¬ 
cluding  logical  completeness,  three- terminal  op¬ 
eration,  cascadabiUty,  gain,  high-speed,  logical 
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No  signal  NOT  Gate  NOR  Gate 


Figure  2:  Illustration  of  light-bullet  dragging  NOT 
and  NOR  gates.  The  power  supply  (pump)  soliton  is 
dragged  to  the  side  by  the  presence  of  one  or  more 
weaker  data  (signal)  solitons. 

fan-in,  phase  insensitivity,  and  low  power  con¬ 
sumption.  A  logic  gate  with  these  features 
can  be  constructed  from  the  interaction  of  two 
initially-coincident  light-bullets  which  are  di¬ 
rected  at  slightly  different  angles.  If  the  ini¬ 
tial  angle  and  energy  ratio  of  the  two  solitons 
is  not  too  great  and  the  nonlinear  force  be¬ 
tween  them  is  attractive,  the  solitons  can  form  a 
bound,  stable  pair  which  propagates  at  approxi¬ 
mately  the  mean  angle  of  the  individual  solitons, 
weighted  by  their  individual  energies  (see  Figure 
2).  This  “spatial  soliton  dragging”  interaction 
can  be  made  insensitive  to  the  phase  of  the  two 
solitons  by  the  proper  choice  of  orthogonal  po¬ 
larizations  for  the  two  light-bullets. 

A  vector  beam  propagation  simulation  of 
an  inverting  light-bullet  dragging  interaction  is 
shown  in  Figure  3.  This  shows  dragging  of 
a  circularly  polarized,  5/xmx5/fmx4/im  100  pJ 
pump  (/p=6  GW/cm^)  by  a  25pJ  signal  (/«=!. 5 
GW/cm^)  in  the  orthogonal  circular  polariza¬ 
tion  in  about  0.7mm  of  propagation  distance 
using  a  saturating  nonlinearity  {I sat  —  Is)  of 
n2  =  10“^^m^/V^,  roughly  that  available  from 
PTS^.  The  simulation  space  is  128  by  64  by  64 
samples  and  is  advanced  a  total  of  100  propaga¬ 
tion  steps. 

Figure  4  summarizes  the  operation  of  this 
light-bullet  logic  gate  by  plotting  the  contrast 
of  the  gate  versus  initial  interaction  angle  and 
gate  length  for  saturation  and  stabilized  light- 
bullets  in  isotropic  media.  (The  details  of  the 
simulations  are  the  same  as  those  given  above.) 

This  implements  a  logical  inverter,  two  of 
which  can  be  placed  in  series  to  create  a  logically- 
complete,  two-input  NOR  gate®’®.  Note  that  al¬ 
though  the  latency  of  the  .7  mm  gate  (for  a  hnear 
index  of  1.5)  is  3.5  ps,  these  operations  can  be 


Figure  3:  Interaction  of  saturation-stabilized  light- 
bullets  with  initial  interaction  angle  of  4® .  The  pulses 
collide  at  the  boundary  of  the  nonlinear  material  and 
form  a  bound  pair  which  drags  to  the  side.  The  pump 
is  dragged  out  of  the  aperture  implementing  an  ultra¬ 
fast  inverter  with  gain  of  4  and  contrast  of  32. 

pipelined  within  the  body  of  the  gate  so  that  sin¬ 
gle  computation  would  occur  each  200  fs.  These 
gates  can  be  operated  in  parallel  in  a  uniform 
block  of  nonhnear  material  (except  for  the  aper¬ 
tures);  yielding  an  (extremely  optimistic)  upper 
bound  of  2.5  X  10^®  bit  operations  per  cubic  inch 
per  second. 

When  implemented  in  such  a  large  system, 
the  timing,  ahgnment,  and  shape  of  the  signals 
wiU  be  imperfect  and  the  operation  of  the  gate 
must  be  tolerant  to  such  perturbations.  Figure  5 
compares  the  operation  of  the  gate  with  a  perfect 
signal  (Figure  3)  to  a  gate  with  positional  and 
temporal  misalignments  as  large  as  the  soliton 
3dB  width  (top),  and  signal  energy  variations 
from  50%  to  200%  of  the  fundamental.  In  nearly 
all  cases,  the  operation  of  the  gate  is  degraded, 
but  the  performance  can  always  be  restored  by  a 
small  increase  in  the  length  of  the  gate.  To  keep 
all  the  gates  in  a  circuit  functioning  within  these 
tolerances,  it  is  essential  that  the  logic  gates  re¬ 
store,  both  logically  and  physically,  the  energy. 
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contrast 


a)  Saturation,  circular  polarizations, 

contrast 


b)  ^4,  circular  polarizations. 

Figure  4:  Contrast  —  energy  of  fundamental  over 
pump  energy  leaked  through  10//m  square  aperture 
versus  interaction  angle  and  propagation  distance  for 
two  isotropic  nonlinear  media. 

position,  timing,  angle,  and  polarization  so  that 
errors  do  not  grow  or  propagate.  This  is  a  natu¬ 
ral  consequence  of  the  three-terminal,  inverting 
nature  of  LBDL  gates. 

To  make  good  use  of  these  gates,  systolic 
arrays  or  a  similar  three-dimensional  data-flow 
technique  needs  to  be  developed  to  take  advan¬ 
tage  of  this  highly  parallel  logic  device.  The 
possibility  of  constructing  all-optical,  light-bullet 
dragging  logic  circuits  with  millions  of  gates  op¬ 
erating  at  THz  clock  speeds  is  strong  motivation 
for  the  continued  materials,  theoretical,  and  sys¬ 
tems  research  necessary  to  realize  these  devices. 
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Figure  5:  Tolerance  of  LBDL  to  misalignments  and 
mistimings  of  signals  (top)  and  energy  variations  of 
signals  (bottom).  These  simulations  demonstrate 
that  the  operation  of  the  gate  can  be  made  tolerant 
to  real-world  system  variations  by  a  small  increase  in 
the  gate  length. 
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Digital  optical  pipeline  cellular  automata  arithmetic  unit 
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The  arithmetic  unit  presents  a  substantial  challenge  to  those  interested  in  the  long  term  goal  of 
ultrafast  all-optical  general  purpose  computers  (1  Ch.  10).  Previously  we  demonstrated  an 
optical  adder  using  electron  trapping  materials  for  which  the  speed  seems  to  be  limited  to 
hundreds  of  nanseconds  (2).  The  multiplication  of  images  in  160  fs  was  recently  demonstrated 
by  means  of  four-wave  mixing  in  a  new  polymer  material  (3).  We  present  a  conceptual  method 
of  using  such  a  material  in  a  loop  to  perform  pipeline  digital  arithmetic  operations  such  as 
addition  and  multiplication.  Only  the  word  operands  are  entered  at  each  cycle  while  the  loop 
performs  2-D  operations  so  that  the  rate  of  computation  in  the  polymer  is  several  orders  of 
magnitude  higher  than  that  for  data  entry  and  removal.  The  method  uses  a  modification  of  the 
transition  function  proposed  previously  for  computation  with  cellular  automata  or  symbolic 
substitution  (4),  (1  Ch.  15).  Cellular  automata  on  an  infinite  plane  were  shown  by  Dr.  Von 
Neuman  to  provide  universal-constructor  machines  capable  of  endlessly  self  reproducing  new 
Turing  machines,  each  of  which  can  compute  anything  that  can  be  computed  by  logical  or 
mathematical  reasoning  (5,6).  Others  have  subsequently  provided  rules  for  such  mappings  (7). 
Flexibility  is  achieved  because  the  operation  performed  may  be  changed  by  replacing  an  optical 
control  image.  This  idea  of  treating  control  information  optically  in  the  same  manner  as  data  has 
been  highly  developed  in  pattern  logic  which  has  been  experimentally  demonstrated  for  an  optical 
ripple-carry  adder  (8, 1  Ch  9  and  10).  The  correlation  operation  required  for  the  cellular  automata 
is  performed  by  four  wave  mixing  and  is  independent  of  the  control  information,  the  data,  and 
their  locations  on  the  array.  The  mapping  of  a  full  adder  and  a  3-bit  multiplier  onto  such  a 
computer  are  shown.  The  problems  of  achieving  short  pipelines  for  small  latency  are  discussed 
and  alternative  possible  improvements  mentioned.  A  proposed  optical  set  up  is  shown  which 
includes  a  loop  around  a  four  wave  mixing  experiment  such  as  that  in  paper  (3).  Some 
difficulties  anticipated  in  performing  such  an  experiment  are  considered. 

Mapping  of  arithmetic  operations  to  cellular  automata 

We  describe  the  transition  rule  used  and  show  how  it  can  be  used  to  align  data  and  perform  logic 
operations  as  the  data  progresses  up  through  the  cellular  automata  plane.  These  techniques  are 
then  shown  to  provide  a  full  adder  and  a  3-bit  multiplier. 

Transition  rule  and  basic  operations.  Figure  1  shows  the  transition  rule  we  use  which  is  a 
modification  of  a  rule  introduced  by  earlier  researchers  (3).  The  rule  is  selected  in  this  way 
because  we  are  only  interested  in  propagation  of  information  upward  in  the  array.  New  inputs 
representing  operands  are  entered  at  the  bottom  edge  of  the  plane  and  the  results  are  extracted 
at  the  top  edge. 

With  this  rule  we  can  move  data  around  on  a  2-D 
cellular  automata  plane.  For  example  figure  2  shows 
the  fixed  control  pattern  for  a  fork  of  a  signal  into 


Fig  1  Rule 


Fig  2  Fork 
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two  directions.  Figure  3  shows  a 
crossover.  We  see  how 
information  may  be  moved 
sideways  across  the  plane  to  the 
left  or  right  from  the  fork  operation 
in  fig.  2.  We  can  also  perform  logic 
operations  with  this  rule,  for 


Crossover 


example  an  OR  is  shown  in  figure  4  and  AND  in  figure  5.  A  mapping 


of  an  arithmetic  operation  to  the  plane  would  tend  to  have  alternating 


regions  of  data  movement  and  logic  computation  as  shown  in  figure  6. 


Mapping  of  a  full  adder.  A  full  adder  computes  the  sum  s  and  the 
carry  c  for  each  bit: 

s  =  a  ®  b  ®  c 
c  =  a*b  +  a«c  +  b*c 


A  B 


Fig  5 
AND 


Fig  6  Plane 


where  0  represents  the 
"exclusive  or"  operation 
XOR.  Figure  7  shows 
the  mapping  of  the 
control  pattern  for  a 
full  adder  using  the 
basic  operations.  The 
number  of  steps  in  the 
pipeline  is  reduced  for 
illustration  by  omitting 
the  alignment  steps. 


The  XOR  pattern  at  the  bottom  left  is  for  performing  A  0  B  between  two  inputs  A  and  B.  It 
computes  the  XOR  from  A*B’  +  A’»B  where  A’  represents  the  complement  of  A.  The  second 
XOR  pattern  from  the  left  at  the  bottom  computes  (A  0  B)’  from  A*B  +  A’ *6’.  The  third 
pattern  passes  C  and  C’  through.  The  top  XOR  pattern  at  the  left  computes  A  0  B  0  C  to  give 
the  output  sum  for  the  full  adder.  It  computes  this  from  (A  0  B)  •  C’  +  (A  0  B)’»  C.  The 
second  pattern  from  the  left  at  the  top  computes  (A  0  B  0  C)’  from  (A  0  B)’  •  C’  +  (A  0  B)» 
C  .  This  is  the  complement  for  the  sum  for  the  full  adder.  The  second  pattern  from  the  right 
computes  the  carry.  The  carry  is  computed  from  A*  B  +  B  •  C  +  A*  C  at  the  bottom.  The 
complement  of  the  carry  is  computed  at  the  far  right  from  A’+B’  •  B’+C’  •  A’+C’. 


Mapping  of  a  3-bit  muitiplier.  A  3-bit  multiplier  may  be  constructed  from  half  (HA)  and  full 
adders  (FA)  as  shown  in  figure  8.  The  mappings  for  these  adders  may  be  used  to  provide  a 
mapping  for  a  3-bit  multiplier,  not  shown  here  for  lack  of  space.  One  of  the  difficulties  is  that 
many  vertical  steps  are  required  to  align  the  outputs  from  one  stage  to  the  next,  making  the 
pipeline  long.  A  different  transition  rule  can  be  used  for  alignment  to  provde  much  larger 
transverse  shifts.  An  interconnection  strategy  such  as  power  of  two  shifts  could  also  be  used. 
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Fig  8  3 -bit  multiplier 


Optical  implementation 


The  four  wave  mixing  experiment  in  (2)  is  modified  to  perform  a  correlation  in  a  loop.  Fig.  9 
shows  the  configuration  proposed.  A  reference  beam  and  the  Fourier  transform  of  the  correlation 
pattern  for  the  rule,  obtained  with  lens  Lj,  form  the  interference  pattern  for  four  wave  mixing  in 
the  polymer.  The  fixed  control  pattern,  like  those  shown  for  the  full  adder  and  multiplier,  are 
entered  on  LCLV,.  The  Fourier  transform  of  this,  formed  using  L2  is  used  to  read  the  pattern  in 
the  polymer.  The  multiplication  of  the  two  transforms  is  then  inverse  Fourier  transformed  with 
lens  L3.  An  optical  threshold  device  passes  only  correlation  peaks.  Such  devices  are  a  challenge 
for  ultrafast  optics.  The  hologram  is  used  to  create  the  new  pattern  of  two  dots  for  every 
correlation  peak  as  required  by  the  transition  rule.  An  optical  laser  amplifier  is  required  to 
compensate  for  losses  in  the  loop.  The  two  input  word  operands  are  inserted  as  a  row  in  the 
plane  at  the  lower  left  of  the  figure.  The  output  word  is  detected  at  the  upper  left.  An  addition 
or  multiplication  is  obtained  every  cycle.  The  operation  of  the  unit  is  switched  from  addition  to 
multiplication  by  changing  the  fixed  control  pattern  providing  a  high  level  of  flexibility. 
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This  paper  describes  the  design  of  an  optoelectronic  graphics  display  processor.  The  processor  has 
the  advantages  of  simplicity  and  extremely  high-speed  generation  of  computer  graphic  images.  It 
achieves  its  speed  by  processing  all  the  pixels  of  an  image  in  parallel.  It  operates  by  accumulating 
polygons,  the  primitive  shapes  of  which  all  objects  in  the  scene  are  composed.  A  front-end 
processor,  not  part  of  this  system,  is  responsible  for  generating  the  coordinates  and  color  or  gray 
shade  of  each  polygon  and  passing  that  information  to  the  processor  described  in  this  paper.  The 
processor  has  the  capability  of  generating  all  the  pixels  of  any  arbitrary  polygon  in  constant  time.  It 
accumulates  all  the  polygons  in  a  scene  in  a  frame  buffer,  and  when  the  frame  buffer  contains  a 
complete  image,  it  is  available  for  display. 

The  Computer  Graphics  Process 

The  sequential  process  of  converting  from  3-D  object  descriptions  to  properties  of  individual  pixels 
is  referred  to  as  the  graphics  pipeline.  The  processor  described  in  this  paper  performs  the 
rasterization  process.  That  is,  it  accepts  the  coordinates  of  polygons  from  a  front-end  processor,  and 
is  responsible  for  scan  conversion,  visible  surface  determination,  and  shading.  1 

SYSTEM  OPERATION 

Implementation  Domain 

The  processor  we  will  describe  is  implemented  with  optoelectronic  integrated  circuits,  OEICs,  that 
have  optical  inputs  and  optical  outputs,  with  electronic  processing  internally.  Each  OEIC  contains 
NxM  processing  elements,  PEs;  that  is,  sufficient  PEs  to  manipulate  an  entire  graphics  frame 
simultaneously.  The  OEICs  are  interconnected  in  free  space  by  holographic  optical  elements, 
HOEs.  HOEs  are  used  to  direct  or  route  signals  from  one  OEIC  to  the  next.  Information  flows 
between  the  processing  elements  in  optical  form,  and  is  processed  inside  the  PEs  electronically. 

System  components 

The  most  important  OEIC  in  the  system  is  the  Programmable  Optoelectronic  Logic  Array,  POLA. 
The  POLA  contains  NxM  identical  PEs;  however,  as  Figures  1  and  2  show,  each  PE  contains  not 
only  two  data  inputs  and  one  data  output,  but  also  three  control  inputs.  Depending  on  the  value 
presented  to  the  control  inputs,  the  PE  can  be  configured  to  perform  a  number  of  different 
operations  on  its  inputs.  A  probable  set  useful  in  supporting  graphics  applications  would  be  AND, 
NAND,  OR,  NOR,  and  an  S-R  or  D  latch,  though  the  figure  shows  only  two  control  inputs  and  four 
selectable  functions.  POLA  gates  are  used  for  all  data  manipulation  and  storage.  System  control  is 
accomplished  by  applying  the  appropriate  control  signals  to  the  control  inputs  of  the  POLA  gates. 
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Fig  1.  POLA  Structure  and  Functionality 
Fig  1  and  2.  POLA  Structure  and  Function 
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Figure  3  shows  an  architecture  for  the  graphics  display  processor  using  five  POLAs  and  beam 
splitters.  There  are  three  buffer  POLAs  shown  in  the  figure:  PB,  the  polygon  buffer,  where  polygons 
are  assembled,  HB,  the  hyperplane  buffer,  where  the  hyperplanes,  defined  below,  are  generated,  and 
FB,  the  frame  buffer,  where  polygons  are  accumulated  to  form  the  frame. 

The  system  operates  by  operating  on  the  frame  in  the  ALU  POLA,  and  cycling  the  result  back  to 
one  of  the  register  POLAs.  This  repeated  cycling  of  information  through  a  single  processing 
element  whose  functionality  may  be  altered  between  cycles  has  been  referred  to  by  A.  Huang  as 
computational  origami. 2  The  main  aspects  of  the  control  unit  have  been  described  previously.  3 
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System  Operation 

Polygons  are  generated  by  the  controller  by  accumulating  hyperplanes  in  a  hyperplane  buffer.  A 
hyperplane  is  a  half  plane  that  extends  from  a  given  line  indefinitely  in  a  given  direction.  Figure  4 
shows  three  hyperplanes  a.,  b.,  and  c.,  intersected  with  a  boolean  and  to  form  a  polygon,  d.  The 
controller  generates  a  specified  hyperplane  by  illuminating  a  hologram  that  contains  an  image  of  a 
hyperplane  that  has  it’s  defining  line  at  the  same  angle  as  the  desired  hyperplane.  Figure  5  shows 
the  hyperplanes  described  by  the  hologram  array  for  an  NxM  =  5x5  pixel  array: 


m 


a.  b.  c.  d. 

Figure  4.  Three  hyperplanes  intersecting  to  form  a  polygon. 


Figure  5.  Holographic  array  of  hyperpla 
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Figure  6.  Selecting  the  proper  hyperplane, 
and  projecting  it  at  the  proper 


In  general,  2N-1  holograms  are  required  to  implement  an  NxN  hyperplane  generator.  Figure  6 
shows  how  the  controller  selects  the  proper  hologram,  and  how  it  projects  it  to  the  proper  place  on 
the  HB.  The  illuminating  array  is  an  array  of  NOR  gates.  The  column  selects  the  hologram 
containing  a  hyperplane  of  the  correct  angle,  and  the  row  selects  where  to  project  it  on  the 
hyperplane  buffer,  HB.  Once  the  hyperplane  has  been  projected  on  and  stored  in  HB,  it  is  then 
anded  with  the  accumulating  polygon  stored  in  PB.  Once  the  polygon  is  complete,  which  will  take 
k  cycles  for  a  k-sided  polygon,  the  complete  polygon  is  ored  into  the  frame  buffer. 

System  Performance 


There  is  good  reason  to  believe  that  the  basic  cycle  time  of  the  machine  described  above  would  be 
in  the  100  MHz  range;  we  have  implemented  a  300  MHz  counter  employing  technology  similar  to 
that  proposed  here. 4  Allowing  25  ns  for  polygon  accumulation  and  other  housekeeping  chores,  a 
polygon  can  be  generated  every  100  ns,  for  a  raw  rate  of  10  million  polygons  per  second.  This 
would  result  in  a  frame  rate  of  100  frames  per  second  of  100,000  polygon  frames.  Processing  a  gray 
scale  or  color  image  would  slow  this  down  by  a  factor  of  roughly  100,  because  of  the  bit-serial 
nature  of  the  operations.  The  rate  could  be  brought  up,  however,  by  employing  additional  OEICs. 
The  architecture  can  be  extended  to  3-D  by  operating  frame-bit-serially. 

References 

1.  Foley,  J.D.,  et  ai.  Computer  Graphics  Principles  and  Practice.  2nd  ed.  1990,  Reading,  Mass.:  Addison- 
Wesley. 

2.  Huang,  A.,  Computational  Origami  -  The  Folding  of  Circuits  and  Systems.  Applied  Optics,  1992.  31(26):  p 

5419-5422.  ee  y  v  ;  H- 

3.  Heuring,  V.P.  and  V.  Morozov.  A  Matrix  ALU  for  Optical  Computing,  in  Proc.  Soc.  Photo  Instr.  Eng.  1992. 
San  Diego,  CA: 

4.  Heuring,  V.P.  and  L.H.  Ji,  Toward  a  parallel  optoelectronic  computer:  A  300  MHz  optoelectronic  Counter. 
Applied  Optics  -  to  appear  in  Nov  1994  Issue. 


46  /  0MC5-1 


A  Constant-Time  Parallel  Sorting  Algorithm  and  its  Optical 
Implementation  Using  Smart  Pixels 

Ahmed  Louri,  James  A  Hatch  Jr.,  and  Jongwhoa  Na 
Department  of  Electrical  and  Computer  Engineering 
University  of  Arizona 
Tucson,  AZ  85721 


1.  Introduction 

Sorting  is  a  basic,  fundamental  operation  used  for  many 
symbolic,  numeric,  and  artificial  intelligence  (AI)  tasks 
[1],  Because  of  its  importance,  there  has  beeii  a  great 
deal  of  work  on  developing  and  analyzing  sorting  algo¬ 
rithms  and  architectures  [2].  In  this  paper,  we  present 
a  novel  constant-time  parallel  sorting  algorithm  and  an 
efficient  optical  implementation  capable  of  both  deter¬ 
mining  the  positions  of  the  sorted  data  elements  oxid 
physically  reordering  them  in  0(1)  time  steps.  It  uses 
photonics  for  highly  parallel  interconnects  and  optoelec¬ 
tronics,  in  the  form  of  ‘*smart  pixels”  for  processing. 
Thus,  it  exploits  the  advantages  of  both  the  optical  and 
electrical  domains. 

2.  A  Constant-time  Parallel  Sorting  Algorithm 

To  illustrate  the  algorithm,  consider  an  example  of 
sorting  a  vector  a  =  [78285]. 

Step  1:  Given  the  input  row  vector  a,  generate  an  nx  n 
matrix  A  (A^)  by  vertically  (horizontally)  spreading  a 
(a^)  n  times.  As  an  illustration,  for  an  input  vector  a 


Step  4:  Resolve  non-unique  ranks  by  computing  matrix 
D>  =  D-{-{-U). 


DA{-U)  = 


Step  5:  Generate  R  by  thresholding  D' ,  where  =  1 

iff  D'  >  Q.  The  rank  matrix  R  is  then: 

tjJ  — 


=  17  8  2 


5  ],  we  generate  A  and  A^  as  follows: 


Step  2:  Compare  every  element  of  a  with  every  ele¬ 
ment  of  g7  by  computing  the  difference  matrix  D  =  A 

+  i-A^- 
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Step  3: 

Generate  the  U 

matrix,  where 

= 

For  our 

example, 

Step  6:  Form  the  rank  vector,  r,  by  summing  each 
column  of  the  matrix  R. 

7  rank  :  3 

)  8  rank  :  4 

2  rank  :  1 

8  rank  :  5 

5  rank  :  2 

The  result  from  the  algorithm  is  the  generation  of  the 
rank  vector,  r  =  [  3  4  1  5  2  ],  which  contains  the  posi¬ 
tions  of  each  of  the  data  elements  in  the  sorted  output. 
This  algorithm  and  an  accompanying  optical  system  are 
complete  when  they  are  capable  of  rearranging  the  in¬ 
put  data  to  the  order  reported  in  r.  The  next  two  steps 
describe  physical  reordering  of  the  input  vector  a. 

Step  7:  Compare  every  element  of  r  to  every  element 
of  [1 , . . . ,  nY  by  expanding  both  by  n  and  subtracting 
the  latter  from  the  former  to  form  the  5  matrix. 
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Step  8:  Reorder  the  sorted  data  by  the  use  of  S  to 
select/discard  A  where  Sij  —  0  indicates  that  data  el¬ 
ement  Aij  should  be  transferred  to  row  i  in  the  sorted 
output. 
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Thus,  the  problem  of  reordering  the  data  reduces  to  se¬ 
lecting  the  appropriate  element  from  each  row  of  A,  since 
each  row  of  A  has  a  copy  of  each  data  element,  and  dis¬ 
carding  the  rest. 

3.  Optical  Implementation  of  the  Constant-time 
Parallel  Sorting  Algorithm 

We  will  now  consider  an  optical  system  that  implements 
the  above  steps  and  physically  reorders  the  input  data 
in  constant  time. 

A.  Generating  the  Rank  Vector  r 

a.  Implementation  of  Step  1  of  the  Algorithm 

As  shown  in  Fig.l,  the  one- dimensional  (1-D)  input,  a, 
modulates  the  columns  of  2-D  laser  array  VI  of  wave¬ 
length  Ai  to  form  the  A  array.  Meanwhile,  modulates 
the  rows  of  a  2-D  laser  array  V2  of  wavelength  A2  to  form 
the  —  Array  (where  the  is  inherent  in  the  use  of 
A2). 

b.  Implementation  of  Step  2  of  the  Algorithm 

The  difference  array  D  of  Step  2  is  formed  by  “summing” 
arrays  A  and  —  This  is  performed  optically  by  merg¬ 
ing  and  interlacing  the  optical  data  planes  so  that  corre¬ 
sponding  values  are  side-by-side.  In  Fig.l,  each  element 
of  the  D  array  contains  two  numbers  that  represent  the 
interlacing  of  the  two  colors.  The  number  in  the  upper 
right  corner  represents  the  intensity  level  of  the  A2  light 
component  while  the  number  in  the  lower  left  corner 
represents  the  Ai  light  component. 

c.  Implementation  of  Steps  3,  4,  5,  and  6  of  the  Algo¬ 
rithm 

Fig.2  illustrates  the  implementation  of  Step  3  and  part 
of  Step  4.  The  —U  array  of  Step  3  is  formed  by  mod¬ 
ulating  a  2-D  laser  array  (not  shown)  of  wavelength  A2. 
The  summation  of  D  and  —Um  Step  4  is  performed 
by  the  beamsplitter  BS2  in  Fig, 2.  Fig. 3  illustrates,  for 
a  single  pixel,  the  subtraction  of  the  absolute  values  in 
D\  Notice  the  integration  of  the  photodetectors,  modu¬ 
lation  electronics,  and  the  surface-emitting  laser  in  this 
close-up  view  of  a  single  smart  pixel.  The  two  light 
components  with  wavelengths  Ai  and  A2  from  a  pixel 
of  the  Array  impinge  onto  the  photodetectors  resid¬ 
ing  within  the  smart  pixel  array.  The  op-amp  subtracts 
the  detected  value  of  A2,  F(A2),  from  the  detected  value 
of  Aj,  F(Ai),  where  the  notation  V()  denotes  the  de¬ 
tected  voltages  corresponding  to  the  incident  light  lev¬ 
els.  The  output  is  then  thresholded  by  a  CMOS  gate 


(not  shown).  The  digital  output  from  the  thresholding 
operation  of  Step  5  then  modulates  the  surface- emitting 
laser  for  communication  to  the  next  stage. 

Fig.4  illustrates  Steps  4,  5,  and  6  on  a  full  scale  where  the 
D'  array  is  being  viewed  from  behind.  The  output  of  the 
electronic  subtraction  and  thresholding  of  D'  by  smart 
pixel  array  SPl  modulates  the  surface-emitting  lasers  to 
generate  the  R  array.  Since  the  lasers  are  integrated 
on  the  same  side  of  the  substrate  as  the  photo  detectors, 
the  R  array  propagates  back  into  the  system  and  passes 
through  half-wave  plate  HWPl.  HWPl  rotates  the  po¬ 
larization  of  the  light  from  the  smart  pixel  array  so  that 
it  will  be  entirely  reflected  from  the  polarizing  beam¬ 
splitter  PBSl.  The  polarizing  beamsplitter  reduces  the 
power  loss  of  the  system  while  also  preventing  backward 
propagation  of  light  from  SPl.  The  half-wave  plate  can 
be  eliminated  from  the  system  if  the  lasers  on  SPl  are 
orthogonally  polarized  with  respect  to  VI  and  V2.  The 
cylindrical  lens  vertically  sums  the  ones  in  the  R  matrix 
to  form  the  rank  vector,  r,  in  accordance  with  Step  6. 
To  this  end,  the  rank  vector  r  is  generated. 

B.  Physical  reordering  of  the  input  data 
In  addition  to  the  generation  of  the  rank  vector,  an 
equally  important  aspect  of  this  paper  is  the  physical 
reordering  of  the  input.  This  has  received  little  atten¬ 
tion  previously.  The  optical  system  in  Fig.5  performs 
the  final  step  of  the  algorithm.  Here,  we  use  a  second 
smart  pixel  array  to  select  the  appropriate  element  from 
each  row  of  the  A  array. 

As  shown  in  Fig.5,  each  pixel  of  SP2  consists  of  a  pho¬ 
to  detect  or,  comparison  logic,  laser  driver  electronics, 
and  a  surface-emitting  laser.  The  photodetector  receives 
the  optical  power  from  each  pixel  of  the  A  array.  The 
comparison  logic  selects  the  appropriate  element  from 
each  row  of  the  A  array  as  outlined  in  Steps  7  and  8. 
The  r  vector  and  the  array  [1  2  3  4  5]^  are  vertically 
and  horizontally  spread  by  writing  them  to  the  column 
and  row  addressing  lines  of  SP2,  respectively.  The  array 
[1  2  3  4  5]^  can  be  easily  implemented  by  a  resistive  net¬ 
work  integrated  onto  the  device  substrate.  The  r  vector 
is  imaged  onto  an  integrated  1-D  photodetector  array 
which  is  internally  connected  to  the  column  addressing 
lines.  If  the  two  input  signals  to  the  comparison  logic 
are  identical,  corresponding  to  the  condition  Sij  ~  0 
in  Eqn,  6,  the  output  enables  the  laser  driver  so  that 
the  optical  intensity  level  detected  at  the  photodetector 
can  be  regenerated  by  the  surface-emitting  laser.  If  the 
two  signals  are  not  identical,  i.e.  Si^j  ^  0,  the  output 
of  the  comparison  logic  disables  the  laser  driver  so  that 
no  light  is  generated.  The  selected  elements  of  A  are 
focused  to  a  vertical  line  at  the  focal  plane  of  cylindrical 
lens  CL2.  Thus,  we  have  effectively  demonstrated  the 
implementation  of  Eqn. 7,  the  physical  reordering  of  the 
sorted  data. 
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Figure  1:  The  optical  implementation  of 
Steps  1  and  2  of  the  algorithm. 
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Figure  2:  The  optical  implementation  of 
Steps  3  and  4  of  the  algorithm. 
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Figure  3:  The  optical  implementation  of 
the  actual  subtraction  of  Steps  4  and  5. 
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Figure  4:  The  optical  system  for  implementing 
Steps  4,5,  and  6  of  the  algorithm. 
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Analysis  of  a  3-D  Computer  Optical  Scheme  with  Bi-Directional 

Interconnects. 
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Optoelectronic  Computing  Systems  Center 
University  of  Colorado 
Boulder,  CO  80309-0525  303-492-0478 

MOTIVATION 

Computer  interconnections  based  on  optics  have  an  advantage  over  electronic  connections 
due  to  the  ability  of  light  to  travel  through  space  without  interference  with  other  light  beams. 
Highly  parallel  computers  will  require  highly  parallel  communications,  and  such  communications 
cannot  be  supported  with  conventional  electronic  implementations  because  of  the  technological 
limitations  of  electrical  interconnection  in  terms  of  area,  latency,  and  power  dissipation  [1],  In 
Fig.  1  the  conceptual  design  of  a  3-D  computer  system  with  bi-directional  interconnects  is 
presented.  This  interconnection  system  allows  for  feedback  of  the  signals  within  the  computer 
system,  which  is  an  essential  part  of  a  Von  Neumann  machine  as  well  as  an  important  aspect  of 
many  algorithms,  from  the  Fast  Fourier  transform  to  polynomial  evaluation.  This  optoelectronic  3- 
D  system  is  being  pursued  by  the  Optoelectronic  Computing  Systems  Center  (OCSC)  at  the 
University  of  Colorado  to  prove  the  utility  and  viability  of  3-D  computers  [2]. 


Figure  1.  Conceptual  design  of  the  3-D  Computer. 

SUMMARY  OF  RESULTS 

In  Fig.  2  an  optical  schematic  of  a  3-D  computer  consisting  of  two  arrays  is  presented  [3]. 
We  developed  a  modeling  tool  of  a  3-D  computer  based  on  the  class  of  systems  presented  in 
Fig.  2.  The  model  describes  the  role  of  all  the  processes  necessary  for  the  correct  estimation  of 
cross-talk  and  noise  in  the  detector  plane.  The  following  phenomena  affect  the  feasibility  of  a  3-D 
computer:  wavelength  variation  in  the  diode  laser  array,  cross-talk  due  to  diffraction  in  the  detector 
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plane,  cross-talk  due  to  the  spreading  of  the  beam  in  the  holographic  plane,  scattering  in  the 
holographic  plane,  aberrations  in  the  optical  system,  misalignment  of  the  diode  laser  array,  and 
tolerance  to  spatial  and  angular  displacement  of  the  holographic  array  and  detector  array.  The 
reliability  of  a  3-D  computer  is  affected  by:  the  diode  laser  beam  intensity  fluctuation,  the  on/off 
ratio  in  the  diode  laser  output  intensity,  and  intrinsic  noise  in  the  photodetector  (shot  noise,  dark 
current,  Johnson  noise).  Results  from  some  of  these  studies  are  described  below. 


Figure  2.  Optical  schematic  of  a  3-D  computer.  The  hologram  and  detector  arrays  are  placed 

in  the  front  and  back  focal  planes. 


Hologram  Entrance  Offset  (mm) 


Figure  3.  Distortion  of  the  signal  spot  location  relative  to  the  paraxial  approximation  as  a 
function  of  the  hologram  offset  and  the  propagation  angle  for  the  Melles  Griot  LAT  Oil  lens. 

The  3-D  interconnect  optical  scheme  depicted  in  Fig.2  is  a  bi-directional  system  which  places 
great  demands  on  the  optics  used.  The  lens,  for  instance,  must  be  symmetrical  and  therefore  have  a 
1:1  conjugate  ratio.  However,  hologram  reconstruction  requires  a  Fourier  transform,  which  is  best 
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accomplished  with  a  non-symmetric  lens  with  an  infinite  conjugate  ratio.  Use  of  a  symmetric  lens 
causes  distortions  in  the  reconstruction  of  the  hologram  which  shifts  the  interconnect  spots  from 
the  desired  location.  In  Fig.  3  the  results  of  the  aberration  analysis  are  plotted  for  the  Melles  Griot 
LATOll  lens.  Since  the  distortions  of  the  lens  are  known,  they  can  be  compensated  through  a 
"pre-distortion"  procedure  at  the  hologram  array  design  step. 


In  Fig.4  the  contrast  ratio  at  the  detector  is  shown  as  a  function  of  the  hologram  and  detector 
dimension.  The  solid  lines  indicate  the  resulting  detector  signal-to-noise  ratio,  and  the  dashed  lines 
indicate  the  resulting  detector  efficiency.  This  plot  allows  one  to  select  the  hologram  and  detector 
array  dimensions  to  give  a  desired  contrast  ratio  at  the  detector. 


Figure  4.  Contrast  ratio  at  the  detector  as  a  function  of  the  hologram  diameter  for  a 

wavelength  variation  of  4  nm. 


CONCLUSION. 

We  developed  a  modeling  tool  for  a  3-D  computer  optical  system  design.  This  tool  allows 
one  to  select  the  parameters  of  the  optical  system  and  estimate  the  spatial  and  angular  misalignment 
tolerance  due  to  packaging  and  various  other  distortions.  This  modeling  tool  will  be  integrated  into 
a  CAD  system  for  3-D  computers. 
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Motivations 

Numerous  digital  optical  computing  architectures  have  been  proposed  and  demonstrated  in 

the  past  years ^  2  3  4^  However,  there  is  still  considerable  debate  as  to  the  value  of  these 
architectures  in  general-purpose  digital  computing.  The  main  issue  is  that  the  switching  speed  of 
the  devices  employed  is  slower  than  electronic  transistors.  The  continuously  irnproving  speed  and 
cost  of  electronic  circuits  can  lead  to  the  misleading  view  that  the  future  of  optics  is  exclusively  in 
the  interconnection  domain.  We  show  here  that  in  many  practical  applications,  optoelectronic 

circuits  can  perform  logic  operations  faster  than  electronics^.  Previous  comparisons  of  electronic 
logic  gates  with  their  optoelectronic  counterparts  did  not  consider  the  effect  of  gate  fanin  and  fanout 
on  in-circuit  performance.  Yet  electronic  gates  suffer  considerable  performance  degradation  with 
increasing  fanin  and  fanout.  This  paper  wiU  address  how  to  systematically  exploit  the  inherent  high 
fanin  and  fanout  abilities  of  optoelectronics  to  reduce  circuit  delay.  We  hope  our  work  will  provide 
a  new  design  paradigm  for  computer  architects.  In  addition,  since  fanin  and  fanout  behavior  have 
important  implications  for  device  designers,  we  hope  it  will  provide  guidance  to  device  developers 
by  identifying  the  most  desirable  characteristics  of  optoelectronic  logic  elements  from  the  circuit 
designer’s  point  of  view. 

Summary  of  work 

The  speed  of  a  digital  computer  is  determined  by  the  delay  characteristics  of  its  logic 
circuits  and  interconnections.  A  small  circuit  delay  is  the  base  on  which  architecture  optimization 
can  be  done  to  maximize  the  system  performance.  For  example,  the  bandwidth  and  latency  of  the 
components  of  the  memory  hierarchy,  the  maximum  clock  rate  of  pipelined  systems,  and  the 
efficacy  of  superscalar  machines  are  all  determined  by  circuit  delay  characteristics.  To  sharpen  the 
focus,  we  concentrate  our  discussion  on  the  combinational  circuit  delay  characteristics  of 
optoelectronic  circuits. 

Our  research  approach  was  to  employ  standard  VLSI  CAD  tools  and  benchmarks  to  calculate  the 
minimum  circuit  delays  of  electronic  and  optoelectronic  circuits,  and  ultimately  to  use  these  tools  to 
design  optimum  optoelectronic  circuits.  We  will  compare  the  latencies  of  electronic  gates  with 
optoelectronic  ones,  not  in  isolation,  as  previous  work  has  done,  but  in  complex  circuits,  where 
the  latency-degrading  effect  of  fanin  and  fanout  can  be  estimated.  This  technique  is  commonly  used 
by  electronic  circuit  designers  to  evaluate  alternative  electronic  circuit  designs,  but  has,  so  far  as  the 
authors  are  aware,  not  been  used  to  smdy  optoelectronic  circuit  design. 

The  three  goals  of  this  work  are:  (1).  to  determine  die  statistical  relationship  between 
combinational  circuit  delay  and  gate  fanin  and  fanout  capabilities;  (2).  to  determine  the  optimd 
values  of  fanin  and  fanout  of  primary  optoelectronic  logic  gates  to  provide  the  smallest  circuit 
delay;  (3).  to  explore  and  define  the  most  promising  potential  applications  of  optoelectronic 
circuits  in  high  speed  digital  computing  systems.  We  will  hmit  our  discussion  of  logic  gates  to  OR 
and  NOR  gates  throughout  this  paper.  There  is  no  loss  of  generality  in  this  choice,  since  the  NOR 
gate  is  universal.  Figure  1  shows  schematically  the  circuitry  of  optoelectronic  OR  and  NOR  gates. 
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In  order  to  quantify  the  improvement  in  circuit  latency  to  be  expected  by  increasing  fanin 

and  fanout,  the  latency  of  sixteen  benchmark  circuits^  was  calculated  for  fanins  ranging  from  2  to  8 
and  fanouts  from  9  to  85.  Figure  2  shows  that  increasing  fanin  from  2  to  8  results  in  a  latency 
improvement  ranging  from  71%  to  259%  as  fanout  is  increased  from  9  to  85.  To  make  a  head-to- 
head  comparison  of  optoelectronic  vs.  electronic  gates,  we  mapped  these  sixteen  benchmarks  to 

0.7  pm  CMOS  and  optoelectronic  circuits.  The  delay  parameters  used  in  the  CMOS  simulation  are 

n 

from  the  MOTOROLA  data  book  .  Only  OR  and  NOR  gates  were  used  for  optoelectronic  circuit 
mapping,  since  these  two  gates  are  likely  to  be  the  fastest  in  an  optoelectronic  application. 

Figure  3  shows  the  average  circuit  delay  performance  of  optoelectronic  circuit  as  a  function 
of  intrinsic  delay  of  the  optoelectronic  gates,  assuming  that  the  average  propagation  delay  between 
any  two  gates  is  167  ps,  corresponding  to  a  physical  signal  pathlength  of  5  cm.  The  figure  shows 
that  optoelectronic  circuits  are  faster  than  electronic  circuits  when  the  intrinsic  delay  of  the 
optoelectronic  gate  is  <1  ns,  a  figure  readily  achieved  in  practice. 

The  other  strength  of  optoelectronic  circuits  is  the  smaller  interconnection  delay  than 
electronics  when  the  load  is  heavy  or  path  length  is  long.  This  advantage  is  gained  from  the 
superiority  of  optical  interconnection.  Contrast  the  one  nanosecond  per  centimeter  delay  to  be 
expected  of  electrical  interconnections  in  an  IC  with  the  33  ps/cm  propagation  delay  in  the  optical 
domain.  We  have  demonstrated  a  simple  finite  state  machine(FSM)  using  4  optoelectronic  NOR 

O 

gates  .  that  runs  at  a  measured  clock  rate  of  >300  MHz,  although  these  gates  are  built  from  discrete 
detectors,  microwave  amplifiers,  and  lasers,  and  the  signal  path  length  between  two  gates  is  as 
long  as  15  cm.  With  integrated  versions  of  optoelectronic  NOR  gate  array  and  hologram  used  in 
the  counter,  the  gate  delay  and  signal  propagation  will  be  reduced.  We  estimate  a  500  MHz  clock 
rate  is  practical  in  an  optoelectronic  version  of  the  FSM. 

Using  the  techniques  demonstrated  in  this  paper,  most  small-grain-size  logic  blocks  such  as 
decoders,  cache  managers,  and  interrupt  controllers  can  be  replaced  by  faster  optoelectronic  circuits 
with  average  circuit  depths  of  <  3  and  average  circuit  delays  of  1.3  -  1.8  ns.  If  large-grain-size 
logic  blocks  such  as  fast  electronic  arithmetic  processing  units  are  controlled  by  smdl-grain-size 
optoelectronic  instruction  decoders,  cache/memory  managers,  and  other  desired  control  circuits,  it 
should  be  possible  to  push  system  clock  rates  beyond  500  MHz  in  a  system  having  a  volume  of 

20x20x20  cm^.  Since  the  longest  signal  propagation  delay  is  <  V3  x  20cm  /  30cm(n5')  =  1.15  ns, 
339  ~  408  MHz  clock  rate  (also  the  instruction  issuing  rate)  will  be  feasible  for  the  whole  system. 
A  hybrid  computer  having  one  integer  and  one  floating  point  unit  can  theoretically  perform  400 
MIPS  and  100  MFLOPs.  If  multiple  arithmetic-logic  units  (ALUs)  are  assembled  within  the  same 
volume  and  operate  in  parallel,  the  system  throughput  will  be  linearly  increased,  since  the  worst 
case  delay  remains  the  same.  The  other  advantage  of  such  a  hybrid  structure  is  that  it  relieves  the 
power  dissipation  constraint  on  a  large  silicon  chip,  since  the  ALUs  consume  only  a  small  part  of 
total  power  and  the  major  power  consumers  are  replaced  by  optoelectronic  circuits  which  can  be 
partitioned  into  separate  modules. 

Conclusion 

The  high  fanin  and  fanout  capability  of  optoelectronic  gates  can  be  exploited  to 
reduce  circuit  depth  and  delay.  Although  in  all  monochromatic  optical  systems  increases  of  fan-in 
beyond  2  may  require  either  more  transmitter  power  or  larger  detectors  or  a  decrease  in  speed  due 
due  to  the  limitations  of  the  constant  radiance  theorem,^  when  the  detectors  must  be  larger  than  the 
diffraction  limit  for  other  manufacturing  reasons,  then  fan-in  can  be  increased  without  additional 
penalty  up  to  a  point.  Detectors  used  in  our  designs  were  larger  than  this  limit ,  and  thus  meet  this 
criterion.  In  this  regime  where  detector  size  is  large  compared  to  a  diffraction-limited  spot,  the  high 
famn  and  fanout  of  optoelectronic  gates  can  be  exploited  to  reduce  circuit  depth  and  delay. 

The  circuit  depth  decreases  logarithmically  with  the  increase  of  gate  fanin  limit.  For  all  the  circuits 
simulated,  the  optimal  fanin  is  around  8,  beyond  which  the  gain  in  circuit  depth  increases  very 
slowly.  The  optimal  fanout  fluctuates  greatly  among  the  circuits.  The  average  maximum  fanout  is 
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85.  The  circuit  delays  of  0.7|im  CMOS  and  virtual  optoelectronic  circuits  have  be  simulated.  The 
result  shows  the  average  circuit  delay  of  optoelectronic  circuits  is  1.7  times  smaller  than  that  of 
CMOS  circuits.  This  comparison  indicates  that  the  optoelectronic  circuits  are  competitive  with  the 
submicron  CMOS  circuits  in  those  applications  where  the  logic  operations  are  simple,  but  famn 
number  is  large  and  fanout  load  is  heavy. 


Intrinsic  delay  of  optoelectronic  gates  (ps) 

Figure  3.  Circuit  delay  comparison 
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1.  Introduction 

We  have  proposed  the  stacked  optical  computing  system  (STOCS)^  which  has  advantages  of 
mechanical  stability  and  miniaturization  compared  to  conventional  optical  systems  using  lenses  and 
beam  splitters.  The  system  has  many  processing  units  (STOCS-PUs)  which  consist  of  planar 
optical  devices  such  as  a  functional  interconnection  device  (FIC),  an  optical  addressable  spatial 
light  modulator  (SLM),  and  a  reading  light  supplier  (RLS)ti2.  Functions  of  FICs  are  image 
splitting,  image  combining  and  space  invariant/variant  other  interconnections.  Output  images  from 
the  FIC  are  written  on  the  writing  side  of  the  SLM.  The  RLS  is  placed  on  the  SLM,  and  it  supplies 
reading  light  to  the  reading  side  of  the  SLM  and  transmits  reading  images  from  the  SLM.  We 
demonstrated  reading  out  function  of  the  RLS,  and  images  directly  written  on  the  SLM  were 
successfully  read  out^. 

The  discrete  digital  correlation  (DDC)  is  an  essential  operation  for  optical  digital  computing. 
Several  optical  systems  for  the  DDC  have  been  proposed  and  demonstrated^-^.  In  this  paper,  we 
describe  the  DDC  using  the  STOCS-PUs.  The  functions  of  the  STOCS-PUs  in  the  experiment  are 
represented  by  the  operation  kernels  of  two  points,  and  we  implemented  XOR  operation  for  a 
coded  binary  image  with  them. 

2.  Devices  in  Experiment 
A.  FiC 

Interconnection  devices  consisting  of  fiber  optics  ribbons  were  used  in  the  "Tse  computers'^." 
We  manufacture  FICs  stacking  thin  fiber  ribbons  which  have  slantwise  light  axes.  Figure  1  shows 
FICs  used  in  the  experiment.  The  FIC  in  Fig.  1(a)  is  a  vertical  pattern  shift  combiner.  The  size  of 
the  fiber  ribbons  is  7.1  mm  x  26  mm,  and  the  thickness  is  250  pm.  The  slantwise  angles  of  the 
fiber  ribbons  are  ±10°,  which  causes  the  shift  width  of  ±  1.25  mm.  The  fiber  ribbons  are  stacked 
as  alternating  in  their  slantwise  light  axes.  The  FIC  splits  an  input  image  into  two  vertically  shifted 
images  as  the  output.  Its  function  is  represented  by  a  kernel  of  two  points  lined  vertically  for  input 
patterns  of  2.5  mm  square  pixels.  FIC  in  Fig.  1(b)  is  a  diagonal  pattern  shift  combiner.  The  length 
of  the  fiber  ribbons  is  10  mm,  and  the  thickness  and  the  slantwise  angles  are  the  same  as  Fig.l(a). 


(b) 

Input  Pattern 


Fig.l  Schematic  diagram  of  FICs  and  operation  kernels:  (a)  vertically  pattern  shift  combiner,  and 
diagonally  pattern  shift  combiner. 
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The  fiber  ribbons  are  stacked  diagonally,  and  the  diagonally  shift  patterns  are  obtained  as  the 
output.  Its  function  is  represented  by  the  kernel  of  two  points  lined  diagonally. 

^  A  liquid  crystal  SLM  is  used  in  the  experiment.  The  size  of  the  SLM  is  35  mm  x  35  mm  x  10 
mm  and  the  active  area  is  about  15  mm  x  15  mm.  This  SLM  has  two  features.  The  first  feature  is 
that  a  fiber  plate  is  used  as  the  substrate  of  the  writing  side.  This  fiber  plate  of  the  wriung  side  can 
transmit  writing  images  to  the  photoconductive  layer  on  the  fiber  plate  with  a  little  degradation. 
The  second  feature  is  that  the  SLM  has  an  internal  electrode  layer  between  the  photoconductive 
layer  and  the  reflecting  layer,  which  are  devided  into  electrode  cells  corresponding  to  pixels.  The 
internal  electrode  reshapes  deformed  or  striped  writing  pattern  from  the  FIC,  because  it  keeps  the 
electric  potential  in  a  cell  uniform.  The  size  of  the  cells  is  2.4  mm  x  2.4  mm,  and  the  gap  between 
the  cells  is  0.1  mm. 

C  RLS 

An  RLS,  which  consists  of  a  fiber  plate  and  a  redirector,  converts  the  light  from  the  He-Ne  laser 
to  the  uniform  reading  lightf^.  A  redirector  attached  to  the  fiber  plate  has  niicro  conical-hollows 
which  change  the  direction  of  the  confined  light  in  the  fiber  plate  by  reflection,  and  the  reflected 
light  irradiates  the  SLM  as  reading  light.  The  size  of  the  fiber  plate  of  the  RLS  is  30  mm  x  30  mm 
X  5  mm.  The  ratio  of  the  reading  light  intensity  to  the  stray  light  intensity  is  9,  and  the  power 

efficiency  of  the  RLS  is  about  5  %. 

3.  Experiment  ,  ,  .  •  - 

Figure  2  shows  an  experimental  configuration  of  the  DDC  implementation  using  the  STOLS- 
PU.  An  input  pattern  is  put  on  the  input  side  of  the  FIC,  and  uniform  light  from  a  halogen  lamp 
illuminates  the  input  pattern.  The  shift  combined  patterns  from  the  FIC  are  written  on  the  writing 
side  of  the  SLM  through  the  decoding  mask.  Images  on  the  SLM  are  read  out  as  follows.  Light 
from  the  He-Ne  laser  is  introduced  to  the  RLS  through  the  optical  fibers  and  fiber  collimators,  then 
the  reading  light  from  the  RLS  illuminates  the  reading  side  of  the  SLM.  The  main  polarizer  between 
the  SLM  and  the  RLS  polarizes  the  reading  light  from  the  RLS,  and  converts  the  polanzation 
modulated  light  from  the  SLM  into  the  intensity  modulated  light  when  the  light  passing  through  the 
main  polarizer  again.  The  intensity  modulated  light  propagates  through  the  RLS  and  the  reducuon 
polarizer,  and  it  is  obtained  as  an  output  image  of  the  STOCS-PU.  The  pass  direction  of  the 
reduction  polarizer  is  aligned  with  the  polarizing  direction  of  the  intensity  modulated  light.  The 
reduction  polarizer  improves  contrast  of  reading  out  images  because  it  reduces  the  stray  light  from 
the  redirector,  which  has  random  polarization.  •  u  mr-  f 

Photographs  of  the  input  and  output  images  of  each  device  of  the  SlOCb-PU  using  the  MC  or 
the  diagonal  pattern  shift  combiner  are  shown  in  Fig.3.  The  input  image  of  the  STOCS-PU  has  16 
pixels  of  2.4  mmx2.4  mm  in  Fig.3(a),  which  represents  the  coding  patterns  of  two  binary 


Main  Polarizer 


He-Ne  Laser  1 X4  Fiber  Coupler 


Fig.2  Schematic  configuration  of  the  STOCS-PU  for  DDC. 


(a)  Input  Image 


(b)  Output  form  the  FIC 


(c)  Output  from  the  PU 


Fig.3  Input  and  output  images  of  the  devices  of  the  STOCS-PU  using  the  FIC  of  the  diagonally 
shift  combiner. 

images^  of  aij=l,0,I,0  and  bij=l, 1,0,0.  Figure  3(b)  shows  the  output  image  from  the  FIC,  which 
is  superposition  of  two  diagonally  shifted  patters.  The  output  image  becomes  a  striped  pattern 
because  each  light  axis  of  the  accumulated  fiber  ribbons  of  the  FIC  changes  alternately.  This  output 
image  is  written  on  Ae  SLM  through  the  decoding  mask.  The  readout  image  from  the  SLM  through 
the  RLS  is  shown  in  Fig.3(c).  Two  bright  pixels  at  the  positions  representing  cij=0,l,l,0  are 
observed  as  an  output  of  the  STOCS-PU,  and  the  result  of  the  XOR  operation  is  obtained. 

In  the  output  from  the  STOCS-PU  using  the  FIC  of  the  vertical  shift  combiner,  bright  pixels 
appear  in  the  positions  representing  the  output  dij=l, 1,0,0,  the  STOCS-PU  implement  B  operation 
for  the  coded  input  image. 

4.  Conclusions 

We  have  implemented  the  DDC  using  the  STOCS-PUs,  and  demonstrated  B  and  XOR 
operations  for  the  coded  binary  image.  The  STOCS-PUs  are  stable  and  compact  because  they  are 
constructed  by  stacking  planer  devices.  The  experiment  shows  the  ability  of  the  STOCS-PU  for  the 
DDC,  and  STOCS-PUs  will  implement  all  16  kinds  of  operations  for  coded  binary  images  using 
the  FIC  of  the  four-patterns  shift  combiner. 
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Introduction 

For  several  years  our  research  group  has  investigated  the  design  of  computer  generated 
holograms,  with  an  eye  towards  producing  any  desired  pattern  of  optical  intercormects.  We've 
investigated  both  amplitude  and  phase  holograms,  and  a  variety  of  design  algorithms  to 
produce  each^.  We  have  decided  to  concentrate  our  efforts  on  the  phase-only  holograms  based 
on  their  superior  reconstruction  ability,  diffraction  efficiency,  and  the  flexibility  to  produce 
interconnects  on  axis.  In  general,  multi-level  phase  holograms  require  an  iterative  design 
approach  to  produce  adequate  results.  We  have  developed  a  hybrid  method  which  couples  the 
Gerchberg-Saxton  algorithm  with  a  random  search  procedure.  We  have  quite  successfully  used 
these  routines  to  produce  dozens  of  holograms  for  optical  computing  applications,  including 
implementing  weights  between  the  synapses  of  an  optical  neural  network^ ,  producing  a 
reference  table  in  an  optical  A/D  converter,  and  to  compute  carry  look  ahead  bits  in  an  opto¬ 
electronic  adder  circuit^.  The  method  has  become  useful  enough,  that  we  have  packaged  the 
design  software  with  a  graphical  interface  to  make  it  easier  to  use  and  more  generally 
applicable.  The  rest  of  the  paper  is  dedicated  to  a  description  of  this  software  pacl^ge,  some 
issues  involved  with  manufacturing,  and  directions  for  future  work. 

Design  Constraints 

To  initiate  a  hologram  design,  the  user  must  specify  the  inputs  and  outputs  of  the  system. 
The  inputs  consist  of  the  hologram  depth  (number  of  phase  levels)  and  dimensions  in  number  of 
pixels.  The  output  is  the  desired  interconnect  pattern  in  optical  intensity.  The  hologram  is 
modeled  as  a  pixelated  square  with  dimensions  of  2^  x  2^^  pixels.  Each  pixel  within  the 
hologram  has  a  square  shape,  a  transmission  amplitude  of  one,  and  assumes  one  of  m-discrete, 
uniformly  distributed  phase  levels.  No  physical  dimensions  are  used  in  the  code,  so  the  same 
hologram  solution  may  be  scaled  for  use  at  any  wavelength  or  pixel  size. 

The  designed  hologram  encodes  a  pattern  that  is  realized  when  iUmninated  with  a  constant 
amplitude,  monochromatic  plane  wave  and  viewed  in  the  Fratmhofer  diffraction  region.  The 
number  of  actual  connection  points  formed  in  the  image  plane  is  weU  in  excess  of  what  the 
designer  can  control,  because  of  multiple  diffraction  orders.  There  is  a  fundamental  pattern 
centrally  located,  and  a  series  of  duplicate  images  surrotmding  it  at  the  higher  orders.  Only  the 
points  contained  within  the  fundamental  pattern  may  be  independently  controlled.  The 
fundamental  pattern  exhibits  the  same  pixelated  structure  as  the  hologram,  with  the  same 
dimensions  2*^  x  2^*  pixels.  Unlike  the  hologram  however,  the  desired  intensity  values  are 
assigned  from  a  continuous  range.  While  every  pixel  in  the  fundamental  pattern  may  contain  an 
interconnect,  in  practice  only  a  sub  region  of  the  space  is  so  utilized.  The  additional  degrees  of 
freedom  given  to  the  problem  yield  a  more  accurate  reconstruction^.  Initially  aU  unconstrained 
pixels  are  set  to  zero  and  then  allowed  to  vary  freely  as  the  design  iterates.  This  allows  for 
some  light  to  be  scattered  outside  the  region  of  interest,  in  exchange  for  better  reconstruction  of 
the  targeted  weights.  As  long  as  more  than  two  phase  levels  are  used  in  the  hologram  design, 
the  region  of  interest  may  be  placed  on  or  off  axis.  If  only  two  phase  levels  are  used,  an  off  axis 
design  is  required  to  avoid  conflicting  with  the  Hermitian  conjugate  image. 

Since  the  hologram  pixels  have  a  uniform  square  feature  shape,  the  resultant  connection 
pattern  will  show  a  2-D  sine-squared  rolloff  in  intensity.  This  effect  can  be  compensated  in  the 
design,  by  multiplying  the  desired  connection  weights  by  an  inverse  sine-squared  function.  This 
simple  adjustment  allows  accurate  connection  weights  to  be  achieved,  with  sampling  of  the 
hologram  at  the  pixel  spacing.  The  same  effect  could  be  achieved  with  a  finer  sampling  of  the 
hologram;  however,  this  includes  an  added  expense  of  increased  computational  time  and 
memory  requirements. 

Design  Algorithm 

The  design  of  the  hologram  requires  the  user  to  control  a  two  step  process  that  converges  to 
a  discrete  phase-level  solution.  The  first  step  in  the  design  process  uses  the  Gerchberg-Saxton 
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iterative  algorithm^.  This  technique  cycles  between  hologram  and  image  space,  applying 
amplitude  constraints  in  each.  The  hologram  is  constrained  to  have  unit  amplitude,  while  the 
im^e  amplitude  is  constrained  to  form  the  desired  connection  pattern.  In  strict  Gerchberg- 
Saxton  the  phase  is  allowed  to  vary  freely  until  a  stable  solution  is  reach^.  In  our 
implementation  with  discrete  phase  levels,  the  algorithm  has  been  modified  such  that  as  the 
routine  progresses  the  phase  is  gradually  constrained  according  to  a  user  definable  schedule. 

This  algorithm  works  very  well  in  finding  solutions  with  high  diffraction  efficiency  and 
reasonable  connection  strength  accuracy,  if  given  adequate  space  bandwidth  product.  To 
improve  the  result,  a  second  step  can  be  applied  using  a  random  perturbation  technique.  In  this 
procedure  randonily  selected  hologram  pixels  are  assigned  new  phase  levels  and  the  effect  on 
the  connection  pattern  is  evaluated.  Only  changes  that  improve  the  reconstruction  accuracy  are 
kept.  The  stopping  criterion  for  both  algorithms  is  either  a  preset  number  of  iterations,  a 
manual  user  interrupt,  or  automatic  conditions  such  as  rate  of  change  in  RMS  error. 

Image  Characterization 

The  software  evaluates  the  success  of  the  solution  by  calculating  an  RMS  error  between  the 
desired  pattern  and  the  diffracHon  pattern.  The  RMS  calculation  can  be  set  to  either  a 
normalized  mode  or  an  absolute  mode.  In  designing  a  single  hologram,  the  normaliz^  RMS  will 
compare  irradiance  values  on  a  relative  scale,  leading  to  a  maximum  diffraction  efficiency. 
However,  when  designing  several  holograms  to  be  used  in  conjunction,  the  irradiance  levels  need 
to  be  controlled  relative  to  one  another.  In  this  case  an  absolute  scale  is  set  by  the  dimmest 
connection  pattern  and  the  rest  of  the  holograms  are  forced  to  reduce  their  diffraction 
efficiencies  to  match. 

Diffraction  efficiency  is  calculated  by  the  code,  and  can  be  used  as  a  design  constraint. 

Cross  talk  of  neighboring  connections  is  also  considered.  To  improve  the  energy  confinement  of 
individual  connections,  and  therefore  the  reconstruction  accuracy,  replication  of  the 
fundamental  hologram  is  utilized.  An  overall  larger  hologram  size,  leads  to  an  overall  smaller 
point  spread  function.  In  the  final  analysis  of  hologram  performance,  increased  sampling  of  the 
connection  pattern  is  used  to  provide  accurate  estimates  of  diffraction  efficiency  and  the  energy 
distribution  in  the  output  plane. 

Manufacturing  Description 

The  final  feature  of  the  code,  is  concerned  with  fabrication  of  the  hologr^.  An  illumination 
wavelength  and  material  index  are  requested  to  convert  the  phase  distribution  into  a  surface- 
relief  map.  A  material  index  of  -1  may  be  used  to  specify  depths  for  a  reflection  hologram. 

Pixel  dimensions  are  never  used  in  the  code,  as  their  effect  wUl  only  be  to  change  the  overall 
magnification  of  the  connection  pattern.  It  is  important  to  remember  in  choosing  a  pixel  size 
however,  that  the  design  assumes  the  realm  of  scalar  diffraction.  The  smallest  feature  size  must 
always  remain  a  few  factors  larger  than  the  illumination  wavelength. 

We  have  collaborated  with  Dr.  Paul  Maker  of  JPL  to  use  a  one-step,  direct  write  e-beam 
method  for  producing  multi-level  phase  holograms^.  The  PMMA  material  used  to  encode  the 
hologram  has  an  etch  rate  directly  related  to  e-beam  exposure.  This  fabrication  method  suffers 
from  a  side-wall  etching  problem  in  acetone  development,  which  produces  pyramid  shapes  in 
the  material  rather  than  square  towers.  We've  noticed  significant  deterioration  of  image  quality 
from  this  effect,  and  are  currently  working  to  counteract  it.  The  code  includes  a  model  of  this 
process  that  can  be  applied  after  the  design  to  predict  the  final  image  quality,  or  during  the 
design  to  compensate  for  the  isotropic  etching.  The  model  assumes  that  the  structure 
introduced  by  the  etch  process  will  be  subwavelength  and  should  not  be  explicitly  modeled  in 
the  diffraction.  Instead,  an  effective  medium  approach  is  taken,  which  adjusts  each  pixel  height 
to  accurately  reflect  the  total  amount  of  material  left  after  development. 

Results 

The  software  package  was  developed  on  DEC  Ultrix  and  OSF  operating  systems  usmg  the 
C  progamming  language.  To  increase  user  friendliness,  a  graphical  interface  was  attached  using 
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the  X11R5  Toolkit  and  the  Motif  widget  set.  This  interface  allows  the  user  to  easily  define 
connection  patterns,  control  the  design  process,  and  view  the  results. 

As  an  example  of  the  software  speed,  running  the  design  of  a  256x256  64  phase  level 
hologram  encoding  a  64x64  interconnect  pattern  (Fig.  la)  on  a  DEC  Alpha  3000  300X  required 
1.5  minutes  to  run  60  iterations  of  the  Gerchberg-Saxton  algorithm.  The  on-axis  solution  found 
(Fig.  lb)  has  an  RMS  error  of  3.8%  and  a  diffraction  efficiency  of  42%.  The  DC  component  was 
reduced  to  its  ideal  weight.  Characterization  of  the  optically  illuminated  hologram  (Fig.  Id), 
yields  a  rough  RMS  error  of  27%  and  a  diffraction  efficiency  of  49%.  The  DC  value  is  roughly 
3-5  times  its  designed  value.  When  the  fabrication  process  is  modeled  (Fig.  Ic),  the  predicted 
RMS  error,  diffraction  efficiency,  and  DC  component  all  fall  to  within  a  factor  of  2  of  the 
measured  values. 

Conclusion 

Manufacturing  errors  are  currently  the  largest  obstacle  to  the  production  of  high  quality 
multi-level  phase  holograms.  We've  made  a  start  at  imderstanding  and  modeling  the  errors,  but 
more  work  needs  to  be  done  in  characterization.  If  the  errors  are  predictable,  then  they  may  be 
compensated  in  the  design.  Otherwise  the  fabrication  process  needs  to  be  improved. 

The  software  package  described  represents  our  current  successes  in  the  design  of  multi-level 
phase  Fourier  Transform  holograms.  The  code  provides  a  flexible  design  and  analysis  tool  for 
the  construction  of  free-space  optical  interconnects.  High  diffraction  efficiencies  and  low 
reconstruction  errors  are  achieved,  when  sufficient  space  bandwidth  product  is  supplied. 


Figure  1:  The  first  three  pictures  show  calculated  images  where  the  value  of  the  connection 
point  has  been  smeared  into  a  square  the  size  of  the  pixel.  Left  to  right,  top  to  bottom,  a)  the 
desired  pattern  b)  the  pattern  produced  from  a  computer  designed  hologram  c)  the  pattern 
produced  including  a  fabrication  model  d)  photograph  from  an  optically  illuminated  hologram 
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Detection  of  x-y  Misalignment  Error  Using  Optical  Crosstalk  in  a 
Lenslet-Array-Based  Free-Space  Optical  Link 
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Introduction  .  ,  .  r  l  .,/  j 

The  McGiU  Photonic  Systems  Group  is  currently  developmg  a  free-space  terabit/second 
capacity  optical  backplane  implementing  the  Hyperplane  architecture[l].  A  preliminary 
representative  portion  of  this  backplane  now  being  developed  consists  of  a  VCSEL-MSM  link 
between  two  Printed  Circuit  Boards  (PCBs).  A  simphfied  optical  schematic  is  shown  in  Figure  1; 
two  lenslet  arrays  relay  signal  beams  generated  by  an  array  of  VCSELs  to  the  MSM  device  arr^. 

The  detection  and  eventual  correction  of  misalignment  errors  between  the  VCSELs  and  MSMs 
is  one  key  problem  that  must  be  solved  if  ±e  system  is  to  function  effectively.  One  way  of 
ensuring  proper  alignment  is  to  implement  active  alignment,  a  process  in  which  system  parameters 
such  as  throughput  or  error  in  spot  position  are  monitored  and  fed  back  to  a  controller  which 
realigns  the  system  by  altering  the  state  of  the  optics,  as  is  the  case  in  Compact  Disk  players  [2], 


Error  detection  ,  .c  j  •  r 

An  active  alignment  scheme  requires  a  simple,  reUable  method  of  detecting  a  misalignment 
error.  This  paper  proposes  a  scheme  for  detecting  lateral  (i.e.  in  the  x-y  direction)  misalignment 
errors  in  a  lenslet-based  optical  interconnect.  The  lenslets  are  assumed  to  be  preahgned  with  high 
accuracy  to  the  device  planes,  for  example  by  gluing  the  lens  substrates  to  the  packages  or  actual 
dies;  additionally,  the  hght  emitted  by  the  light  sources  is  assumed  to  have  a  Gaussian  irradiance 
profile.  The  basic  principle  is  outlined  in  Figure  2,  and  is  explained  as  follows. 

If  the  lenslets  are  perfectly  ahgned  with  respect  to  each  other,  as  shown  m  Figure  2a,  then  the 
optical  crosstalk  will  be  neghgible.  On  the  other  hand,  if  there  is  a  lateral  misalignment  between  the 
lenslets,  the  crosstalk  component  will  be  steered  into  an  adjacent  ‘ahgnment’  detector.  There  are 
thus  two  kinds  of  detectors:  conventional  ‘signal’  detectors  for  data  and  ‘alignment  detectors  for 


ahgnment  information.  ■  r  ■  i.  r, 

For  an  array  of  square  lenslets  and  an  array  of  detectors  on  a  grid  of  pitch  P,  as  shown  in 
Figure  3,  the  hght  coupled  into  the  alignment  detectors  due  to  misahgnment  crosstalk  is  a  function 
of  the  misalignment  error  {^Ax^,Ay^.  For  example,  the  crosstalk  components  steered  into  detectors 
A  and  B,  and  P^g,  are  given  by  equations  (1)  and  (2)  respectively: 
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where  3w-P. 

Circular  lenses  will  have  bounds  of  integration  different  from  those  above  [3]. 

The  photocurrents  produced  in  the  ahgnment  detectors  will  be  proportional  to  the  optoelectronic 
sensitivity  S.  By  running  the  photocurrents  across  a  resistor,  as  shown  in  Figure  4b,  voltoge 
swings  and  V^,  proportional  to  P^  and  P^  respectively,  will  be  created;  these  voltages  will  yield 
information  about  the  misahgnment  and  subsequently  can  be  used  to  realign  the  system. 

Figures  5  and  6  show  plots  of  the  optical  power  coupled  into  a  signal  detector  as  well  as  and 
14  for  various  misahgnment  values  and  directions,  assuming  P=25p  pm,  R=  20  kQ,  5=0.5  A/W, 
and  total  power  contained  per  incident  signal  beam  =  1  mW.  Figure  5a  shows  the  total  power 
coupled  into  a  signal  MSM  and  Figures  5b  and  5c  respectively  show  I4  Vf,  as  Ay^  is  held  at  0 
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liiti  and  Ax^  is  swept  from  0  to  250  |im.  Figure  6a  shows  the  total  power  coupled  into  a  signal 
MSM  and  6b  and  6c  respectively  show  and  as  Ay^  and  Ax^  are  swept  along  the  Ay^  =  Ax^ 
line  for  0<  Ax^  <250  |J.m.  As  can  be  seen  in  Figures  6b  and  6c,  geometric  considerations  indicate 
that  for  a  diagonal  displacement,  the  Ax^  error  must  be  under  -125  |J,m  (P/2)  or  the  measured  error 
signal  will  yield  ambiguous  information  about  the  misalignment.  Experimental  results  obtained 
using  an  array  of  MSMs  on  a  250  |Lim  pitch  are  in  agreement  with  the  above  theoretical  principles. 

Conclusion 

The  main  advantage  of  using  the  crosstalk  technique  is  that  no  dedicated  alignment  beam  is 
required  to  obtain  information  on  misalignment  errors;  as  a  result,  the  optical,  optomechanical,  and 
circuit  layout  complexity  are  reduced  when  compared  to  other  techniques  for  acquiring 
misalignment  errors,  such  as  quadrant  detectors.  This  advantage,  however,  comes  at  the  expense 
of  available  real  estate  on  die:  a  quadrant  detector  requires  4  detectors  to  determine  an  x-y  error 
whereas  the  crosstalk  technique  described  here  requires  8.  In  both  cases,  however,  the  pin-outs 
could  be  reduced  by  multiplexing  the  alignment  detector  signals. 
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Figure  1:  Schematic  of  the  optical  link 
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Figure  2:  Crosstalk  for  detecting  misalignment  error 
a)  no  error  b)  misalignment  error 
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Figure  3a):  Lenslet  array  layout  with  perfectly  aligned  incident  beams  b)  misalignment  of  (Ax^,AyJ 


64  /  OMCIO^S 


^bias 


Q  0 


0  □ 


-a  Signal 
detectors 

jj  Alignment 
detectors 


MSM 


Cl  u 


R>  V 


(a)  '(b). 

Figure  4a):  detector  array  layout  with  signal  and  alignment  detectors;  4b)  Bias  network  for  an 
alignment  detector. 
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Figure  5:  Power  coupled  and  misalignment  voltages  for  Ay^  =0  ( 0<  Ax^  <250  p.m). 
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Figure  6:  Power  coupled  and  misalignment  voltages  for  Ay^  —  Ax^  (0<  Ax^  <250  p-m.) 
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Introduction 

The  benefits  of  optical  interconnections  to  overcome  the  limitations  of  conventional  metallic 
connections  in  electronic  computers  are  widely  recognised.  To  demonstrate  the  potential  of 
optics  several  systems  have  been  constructed  to  interconnect  optoelectronic  arrays  of  devices. 
So-called  "smart-pixel"  devices  are  seen  as  a  promising  technology  in  future  optoelectonic 
parallel  processing  systems. 

In  order  to  be  competitive  with  current  high-performance  electronic  machines,  optoelectronic 
machines  have  several  requirements.  These  include  large  arrays  of  fast,  low-power,  low-cost 
processing  elements  integrated  with  optical  input  and  output  channels;  and  future  powerful 
computers  should  ideally  take  advantage  of  the  global  interconnection  capability  of  optics.  The 
additional  requirements  of  a  compact,  reliable,  and  cost-efficient  optical  interconnection  system 
point  towards  the  use  of  micro-optical  technologies  and  components. 

Two-promising  optical  technologies  which  satisfy  the  above  requirements  and  which  we 
consider  in  this  paper  are  GRIN  rod  lenses,  which  have  been  used  to  interconnect  arrays  of 
photothyristor  devices  [1],  and  planar  microlenses  fabricated  by  ion-exchange  in  glass  [2,3]. 
We  describe  the  requirements  of  the  optical  system  for  the  interconnection  of  arrays  of 
optoelectronic  processing  elements  and  shall  present  details  of  the  theoretical  and  experimental 
performance  of  both  technologies.  From  the  physical  properties  of  the  lenses,  we  derive  some 
criteria  which  allow  a  comparison  of  the  two  technologies  for  a  given  imaging  task. 

Optical  System  Requirements 

To  design  an  optical  imaging  system  for  interconnecting  arrays  of  optoelectronic  devices  it  is 
necessary  to  consider  the  physical  and  optical  characteristics  of  the  devices.  To  date  some  of 
the  most  promising  devices  available  for  information  processing  systems  are  arrays  of  S- 
SEEDs,  which  are  optically  controlled  modulators,  and  arrays  of  differential  pair  PnpN 
photothyristors,  which  are  incoherent,  broadband  light  sources  and  detectors.  Although  the 
optical  characteristics  of  these  devices  are  significantly  different  there  are  some  common 
requirements  that  the  imaging  system  must  satisfy  for  these  and  other  devices. 

Typically,  device  arrays  of  the  order  of  64x64  to  128x128  have  been  available  to  date.  The 
spacing  between  devices  is  limited  by  VLSI  fabrication  technology.  Typically  the  device  pitch 
is  of  the  order  of  lO-SOpm  for  relatively  ‘unsmart’  pixels  which  contain  little  or  no  integrated 
electronics,  and  can  be  as  high  as  200-300|im  for  ‘smart’  pixels.  To  be  competitive  with 
electronics,  future  optoelectronic  components  must  have  a  higher  communication  bandwidth 
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and  interconnection  density  than  those  available  until  now.  It  would  be  conceivable  for  typical 
device  arrays  to  have  dimensions  of  the  order  of  a  few  millimetres  to  a  few  centimetres.  In 
addition,  in  order  to  be  as  fast  as  possible,  the  optical  detectors  must  be  kept  as  small  as 
possible.  S-SEED  arrays  have  windows  «:!5pm  and  current  photothyristor  arrays  have  windows 
«10-30pm.  The  size  of  the  windows  will  determine  the  resolution  required  by  the  imaging 
system.  In  addition  to  the  imaging  system  must  exhibit  little  or  no  distortion. 

The  speed,  and  therefore  overall  processing  power,  of  an  optoelectronic  computing  system  will 
be  determined  not  only  by  the  device  characteristics  (sensitivity,  optical  output  power,  internal 
capacitance  etc.)  but  also  by  the  efficiency  of  the  imaging  system.  Therefore  resolution  must  be 
maximised  so  that  the  image  spot  falls  completely  within  the  device  window,  and  distortion, 
vignetting,  and  other  aberrations  in  the  optical  system  must  be  minimised. 

The  above  imaging  requirements  can  be  fulfilled  by  bulk  optics,  however  due  to  issues  of 
compactness  we  investigate  microoptic  alternatives.  Moreover,  we  limit  the  present  study  to  a 
comparison  of  GRIN  rod  lenses  with  planar  microlenses  fabricated  by  ion-exchange  in  glass 
since  these  technologies  were  available  to  us.  From  the  above  discussion  it  becomes  clear  that 
certain  parameters  and  issues  are  important;  (i)  a  large  number  of  pixels  must  be  imaged 
which  means  a  large  field  of  view  with  high  resolution  and  high  NA;  (ii)  the  uniformity  of 
image  illumination  is  important  due  to  the  limited  operation  range  of  some  devices;  (iii) 
distortion  must  be  low  to  maximise  energy  coupled  into  the  small  detector  windows  which  are 
necessary  for  fast  device  operation;  (iv)  the  overall  efficiency  of  the  imaging  system  must  be 
high  to  reduce  the  optical  insertion  loss;  and  (v)  it  is  desirable  to  reduce  the  system  volume  by 
using  microoptic  components.  A  fiirther  issue  which  must  be  considered  is  access  to  a  Fourier 
plane  for  inserting  elements  to  perform  an  interconnection  topology  that  is  more  complex  than 
a  simple  1:1  interconnection. 

Measurement  of  Optical  Characteristics 

We  have  evaluated  GRIN  rod  lenses  which  were  obtained  from  the  Gradient  Index 
Corporation  (BG50).  They  have  a  diameter  of  5mm  and  a  length  of  30.06mm.  The  rods  are 
slightly  less  than  quarter  pitch  (approx.  0.2  pitch)  to  allow  a  longer  working  distance.  The  rods 
can  be  used  in  two  configurations;  (i)  an  image  at  the  input  face  will  be  collimated  at  the  exit 
face  -  two  such  rods  can  be  used  together  in  an  equivalent  of  the  4f  configuration,  and  (ii)  a 
single  rod  lens  can  image  an  object  that  is  placed  at  a  distance  from  lens  face.  When  used  in  the 
second  configuration  the  lenses  have  a  working  distance  of  ~16mm  for  unit  magnification. 

The  planar  ion-exchange  microlenses  have  been  fabricated  by  our  own  facilities.  The  index 
profile  is  produced  by  the  silver-sodium  ion  exchange  in  glass  and  is  described  fully  m  [3]. 
Typically  they  have  a  diameter  of  250pm,  focal  length  of  1000pm,  and  an  NA  of  0. 1 . 

We  have  used  a  setup  which  allows  a  test  pattern  illuminated  by  an  LED  to  be  imaged  by  the 
GRIN  rods  or  the  ion-exchange  microlenses.  We  have  various  test  patterns  for  measuring  the 
resolution,  image  spot  sizes,  and  distortion.  Figure  1  shows  the  images  of  spot  array  test 
patterns  produced  by  (a)  BG50  GRIN  rod  and  (b)  an  ion  exchange  microlens.  The  test  pattern 
spot  diameter  was  6.2pm  with  a  pitch  of  25pm.  The  size  of  the  image  spots  in  both  cases  is 
approximately  8-9pm  in  (a)  and  approx.  7-8pm  in  (b).  The  image  sizes  shown  are  400pm  in 
width  and  across  this  field  size  no  appreciable  increase  in  the  spot  size  is  discernible.  In 
addition  some  slight  distortion  can  be  seen  in  the  GRIN  image  of  Fig,  1(a).  The  line  scans  of 
Fig.  1(c)  and  (d)  show  good  intensity  uniformity  across  the  field. 
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Figure  1.  Images  of  spot  array  pattern  imaged  by  (a)  BG50  GRIN  rod  lens,  and  (b)  ion-exchange  microlens. 
The  test  pattern  spot  size  was  6.2pm  with  a  pitch  of  25pm.  (c)  and  (d)  show  line  scans  of  images  in  (a)  and  (b). 

The  measurements  described  above  provide  information  on  the  maximum  object  field  size,  spot 
sizes/resolution,  distortion,  and  uniformity.  It  is  necessary  to  observe  these  effects  at  larger 
fields  to  determine  the  performance  limits  of  the  lenses.  Theoretical  modelling  of  the  lenses  by 
ray  tracing  analysis  supports  the  observed  performance. 

The  results  of  theoretical  modelling  and  experimental  measurements  of  GRIN  rod  lenses  and 
planar  ion-exchange  microlenses  allows  the  evaluation  of  both  technologies  for  the 
interconnection  of  optoelectronic  device  arrays.  We  shall  present  a  detailed  study  of  the 
technologies  which  allows  a  derivation  of  figures  of  merit  that  include  the  optical  performance 
and  system  volume  issues  for  a  given  imaging  task. 
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Recently  differential  pairs  of  PnpN  optical  thyristors 
have  been  developed  for  use  within  optical  computing 
architectures  [1].  These  emitter-receiver  elements  have  a 
fast  turn-on  and  turn-off  time  (10  nsec)  together  with 
high  sensitivity  (50  fJ  absorbed  optical  switching 
energy).  A  relatively  simple  fabrication  process  allows 
large  arrays  (16  x  16)  of  these  devices  to  be  constructed 
with  a  pitch  of  50  |am. 

In  this  paper  we  describe  the  development  of  a  suitable 
optical  system  to  image  data  between  optical  thyristor 
arrays.  The  optical  system  must  meet  several 
requirements.  It  should  be  compact,  in  order  to  allow 
systems  containing  many  processor  planes  to  be 
constructed,  it  must  allow  an  object  size  of  at  least  2x2 
mm  in  order  to  image  a  GaAs  chip,  it  should  provide  the 
capability  of  interconnection  to  additional  planes  of 
devices  and  must  be  relatively  low  in  price  in  order  to 
keep  system  costs  low. 

Thyristors  are  Lambertian  emitters  and  so  the  numerical 
aperture  (NA)  of  the  optical  system  should  be  as  large  as 
possible  to  provide  efficient  imaging. 

Although  this  investigation  is  motivated  by  the 
requirements  of  PnpN  optical  thyristors  the  conclusions 
are  valid  for  other  optical  systems  which  contain  arrays 
of  LED-type  emitters. 


Gradient  refractive  index  (GRIN)  rod  lenses  have 
several  advantages  for  this  application.  Their  compact 
size  is  compatible  with  that  of  the  optoelectronic  device 
arrays  and  they  have  theoretically  zero  distortion.  TTiey 
are  simple  to  align  and  the  flat  end  faces  allow  the 
potential  for  direct  integration  with  device  arrays.  Other 
researchers  have  demonstrated  that  it  is  possible  to 
construct  multi-plane  systems  which  contain  several 
GRIN  rods  [2]. 

Fig.  1  shows  the  system  which  is  currently  used  for 
array  to  array  data  transcription  [3].  Two  0.2  pitch 
gradient  refractive  index  (GRIN)  rod  lenses 
(manufactured  by  the  Gradient  Index  Lens  Company) 
are  used  to  image  data  from  array  to  anay.  These  have  a 
diameter  of  0=5  mm,  a  length  of  27.4  mm  and  an  on- 
axis  NA  of  0.19.  The  index  gradient  quadratic  constant 
Va  =0.046.  The  10  mm  square  cube-splitter  placed 
between  the  two  lenses  allows  data  input  and  output  to 
additional  planes.  By  using  a  working  distance  of  4.3 
mm  light  is  collimated  through  the  cube-splitter, 
minimising  spherical  aberration.  The  total  array  to  array 
system  length  is  65  mm.  With  this  configuration  it  is 
also  possible  to  place  a  Fourier  plane  diffractive  optical 
element  at  one  of  the  cube-splitter  faces,  providing  fan¬ 
out  from  plane  to  plane.  This  is  potentially  useful  within 
information  processing  systems  [4]. 


Figure  1.  Array  to  array  imaging  with  GRIN  rod  lenses  (BS  -  beam  splitter). 
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We  have  previously  investigated  the  3rd  order 
aberrations  of  this  system  [5].  This  will  be  developed 
further  in  this  paper  by  use  of  exact  ray  tracing 
techniques.  We  will  compare  these  with  experimental 
results  and  consider  the  implications  which  this  has  for 
maximum  array  size  and  density. 


r  (mm) 

Figure  2.  Relative  intensity  as  a  function  of  radial  object 
position  (r)  for  a  range  of  inter-lens  spacing  d.  The 
points  show  experimental  results  for  d=13  mm. 

The  maximum  number  of  channels  in  the  system  is 
given  by  (WVp)^  where  W  is  the  array  side  length  and  p 
is  the  channel  pitch.  The  switching  speed  of  an  optical 
thyristor  is  inversely  proportional  to  the  optical  power 
received  and  so  is  determined  by  the  insertion  loss  of  the 
system.  It  is  therefore  necessary  to  investigate  the 
variation  of  spot  size  to  and  insertion  loss  with  both  the 
location  of  the  source  in  the  object  plane  and  the  inter¬ 
lens  spacing  d. 

Fig.  2  shows  the  variation  of  insertion  loss  with  object 
displacement  r  for  a  system  which  contains  a  single  10 
mm  cube-splitter.  A  paraxial  ray-tracing  model  is  used 
in  which  the  NA  of  the  GRIN  lenses  is  given  by 


NA^  =  VAnoro|l--r^^ 

2  y 


^-{rlrof 


ni/2 


where  the  refractive  index  profile  is  given  by  /2(r)=/io(l- 
i4/2  r^),  ro=0/2  and  where  (p  is  the  angular  distance  from 
the  y-axis.  The  effect  of  vignetting  due  to  the  second 
lens  aperture  is  also  modelled.  Experimental  results  for 
rf=13  mm  are  also  given,  demonstrating  the  accuracy  of 
the  model.  For  a  2  x  2  mm  array  the  maximum 
displacement  of  an  element  from  the  axis  is  r=1.41  mm. 
It  can  be  seen  that  for  this  displacement  a  50%  insertion 
loss  occurs  for  d>20  mm  and  total  obscuration  occurs  at 
d>60  mm.  A  relative  intensity  of  1.0  corresponds  to  an 
insertion  loss  of  -14.9  dB. 


r  (mm) 

Figure  3.  Spot  diameter  co  as  a  function  of  object 
displacement  along  the  x-axis. 

An  exact  ray-tracing  technique  was  used  to  investigate 
the  variation  of  spot  diameter  co  (measured  along  the  x 
and  y  axes)  with  object  displacement  along  the  ;c-axis. 
The  results  of  this  are  shown  in  Fig.  3,  where  co  is 
defined  as  twice  the  RMS  width.  Here  the  thyristor  is 
modelled  as  a  point  source.  Experimental  measurements 
of  .x-axis  spot  diameter  are  also  shown.  It  can  be  seen 
that  the  experimental  results  show  a  spot  size  that  is 
significantly  larger  than  indicated  by  ray-tracing.  This 
may  be  due  to  scattering  from  the  surface  of  the  lenses 
and  will  be  investigated  in  more  detail.  These  results 
show  that  the  maximum  spot-width  will  be  50-60  ^m 
for  an  element  at  the  edge  of  a  2  x  2  mm  array  and 
approximately  20  4m  for  an  element  on  the  axis. 


Figure  4.  Spot  profile  obtained  by  Monte  Carlo 
simulation 

In  order  to  obtain  a  more  accurate  picture  of  the 
distribution  of  light  within  the  image  plane  a  Monte 
Carlo  ray-tracing  simulation  was  performed.  An  optical 
thyristor  was  modelled  as  a  4  x  8  4m  square  in  the 
object  plane  which  emitted  1500  rays.  Fig.  4  shows  the 
spot  profile  for  two  different  object  positions  (Gaussian 
curves  have  been  fitted  to  the  data).  These  results  show 
that  the  e"^  spot  diameter  is  17  4m  for  the  on -axis 
source. 
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Several  conclusions  can  be  drawn  from  these  results.  In 
order  to  develop  a  compact  optoelectronic  system  the 
imaging  optics  should  not  have  a  significantly  larger 
cross-section  than  that  of  the  device  planes.  The  5  mm 
diameter  of  the  GRIN  lenses  used  here  (currently  the 
largest  available  off-the-shelf)  is  only  slightly  greater 
than  that  of  the  GaAs  optical  thyristor  chips.  However 
elements  at  the  edge  of  the  2  x  2  mm  array  display 
significant  aberrations,  resulting  in  a  spot  size  of  50-60 
pm.  This  low  resolution  is  due  to  the  high  incidence 
angles  of  rays  emitted  by  the  thyristors  [5],  and  these 
lenses  display  much  better  resolution  for  planar 
illumination  [6]. 

A  spot  size  of  60  pm  allows  an  array  of  32  x  32 
elements  to  be  used,  with  an  array  size  of  2  x  2  mm. 
This  density  is  consistent  with  a  power  dissipation  of  10 
W/cm^  for  these  elements  and  so  a  reduced  device  pitch 
is  not  necessarily  desirable  in  the  short  term.  The  results 
for  spot  profile  given  in  Fig.  4  suggest  that  the  minimum 
separation  of  the  two  halves  of  a  differential  pair  should 
be  10  pm.  Further  work  is  however  required  to 
determine  the  effect  of  cross-talk  on  system 
performance. 

Previous  research  [5]  has  indicated  that  the  performance 
of  the  system  may  be  improved  by  using  a  microlens 
array  to  collimate  the  light  emitted  by  the  thyristors 
prior  to  the  GRIN  lenses  (see  Fig.  5).  This  results  in 
smaller  ray  angles  at  the  GRIN  lens  and  hence  reduces 
aberrations.  The  numerical  aperture  of  the  system  will 
be  increased  and  will  be  more  uniform  across  the  anay. 
The  spot-width  at  the  image  plane  will  be  reduced,  thus 
increasing  the  optical  power  at  each  detector.  The 
disadvantage  of  this  approach  is  an  increase  in  system 
cost  and  complexity.  This  will  allow  some  increase  in 
array  side-length,  but  will  still  be  subject  to  the 
constraints  which  are  imposed  by  vignetting  of  off-axis 
sources.  This  results  in  the  conclusion  that  the  beam¬ 


splitter  dimensions  should  be  of  the  same  order  as  the 
lens  diameter  if  the  array  size  is  to  be  maximised. 

We  will  present  more  detailed  results  of  the 
performance  of  the  system,  both  with  and  without 
microlenses,  and  will  discuss  the  implications  which 
these  have  for  system  size,  array  density  and  data 
throughput. 
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Figure  5.  The  use  of  microlens  arrays  (MLA)  to  collimate  light  emitted  by  the  thyristor  arrays. 
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Optical  spot  array  generators  are  useful  for  inputs  to  optical  processing  and 
computing  systems.  In  this  paper  we  demonstrate  rib  waveguides  overlaid  with  many 
micrometer-scale  grating  areas  which  produces  an  array  of  optical  beams.  This  beam  array 
generator  can  produce  a  regular  matrix  of  spots  which  processors  such  as  S-SEED  arrays 
require,  as  well  as  spot  patterns  in  less  regular  shapes,  such  as  L-shape,  which  are  useful  for 
other  types  of  processors  [1].  The  grating  outcoupler  approach  offers  an  advantage  over  the 
binary  phase  grating  approach  to  spot  array  generation  because  arbitrary  patterns  can  be 
implemented  with  the  flexible  arrangement  of  the  grating  areas.  In  addition,  this  technique 
is  relatively  insensitive  to  variations  in  input  wavelength  (e.g.  mode-hopping  in  a 
semiconductor  laser)  and  to  temperature  variations  of  the  device.  Finally,  the  highly 
directional  nature  of  grating  outcoupling  yields  beams  with  very  low  divergence,  easing 
alignment  tolerances. 

The  gratings  are  fabricated  in  a  single  step  holographic  photoelectrochemical  etch 
[2].  This  process  circumvents  e-beam  lithography  stitching  errors,  and  has  fewer  processing 
steps  than  photolithographic  grating  formation.  Photoelectrochemical  etching  (PEC)  also 
allows  gratings  of  varying  etched  depth  to  be  made.  This  attractive  possibility  would  allow 
the  outcoupled  spot  intensity  to  be  tailored  to  system  requirements. 

We  have  experimentally  demonstrated  high  efficiency  first  order  gratings  etched  onto 
GaAs/AlGaAs  rib  waveguides.  These  outcoupling  gratings  are  pixellated  into  10  jum  x  10 
jum  squares,  so  that  an  array  of  outcoupled  beams  is  generated.  The  device  geometry  is 
illustrated  in  Figure  1  with  a  SEM  micrograph  of  the  surface  relief  grating  array.  In  this 
micrograph,  second  order  gratings  (0.85  /nm  period)  are  shown  to  illustrate  the  device 
configuration.  The  actual  demonstrated  outcoupler  uses  gratings  with  a  0.35  nm  period, 
designed  to  produce  a  single  diffracted  order  (at  A  =  1.064  /xm)  out  of  the  top  of  the  device, 
and  another  single  diffracted  order  into  the  substrate.  All  gratings  were 
photoelectrochemically  patterned  on  the  entire  sample  at  one  time,  in  a  30  second  etch. 

A  demonstration  of  spot  array  generation  was  performed  by  end-fire  coupling  TM 
polarized  light  (A  =  1.064  juni)  into  a  single  rib  waveguide  and  imaging  the  surface  of  the 
grating  area  with  a  CCD  camera.  The  viewing  angle  of  the  imaging  system  was  swept  from 
-45  degrees  to  +55  degrees  (Figure  2).  Outcoupled  light  was  observed  at  21.90  degrees. 
Outcoupled  light  was  not  observed  at  20.10  degrees,  23.70  degrees,  or  at  any  other  viewing 
angle.  This  narrow  angular  beamwidth  verifies  that  the  grating  produced  a  single  diffracted 
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order  into  the  superstrate.  The  reported  diffraction  angle  agrees  with  the  calculated 
diffraction  angle  to  within  the  measurement  accuracy  of  +/-  4  degrees.  Figure  3  shows  a 
long  row  of  diffracted  spots  from  the  pixellated  outcoupling  grating.  As  seen  in  this  figure, 
a  small  portion  of  the  guided  mode  is  outcoupled  from  a  rib  waveguide  at  20  m  intervals. 
This  light  is  imaged  with  a  microscope  objective  oriented  at  21.90  degrees  from  normal. 

In  conclusion,  a  single,  short,  photoelectrochemical  etch  was  used  to  generate  a  large 
array  of  10  m  x  10  Mm  grating  patches.  The  outcoupling  from  the  grating  on  these 
waveguides  demonstrates  a  simple  and  versatile  method  of  spot  array  generation. 
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Figure  1.  A  SEM  micrograph  that  illustrates  the  device  geometry.  The  devices  are  10  Mm 
wide  rib  waveguides  with  10  Mm  x  10  Mm  pixellated  areas  of  gratings. 
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GaAs/AlGaAs  waveguide 
with  outcoupling  grating 

Figure  2.  The  optical  characterization  setup. 


Figure  3.  A  surface  relief  grating  array  imaged  with  a 
microscope  objective  positioned  at  21.90  degrees  from  normal. 
Light  is  coupled  into  a  single  rib  waveguide  from  the  left  and 
a  1-dimensional  spot  array  is  outcoupled  from  the  waveguide 
surface. 
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Introduction 


Free-space  optical  systems  provide  parallel  access  to  2D-dataplanes  for  data  routing  and 
processing  purposes.  For  the  construction  of  complex  optical  data  processing  systems 
microintegration  is  necessary  to  get  compact  setups  at  acceptable  costs.  These  systems 
should  be  modular  and  self  aligning  and  should  allow  to  cascade  an  arbitrary  number 
of  stages  to  construct  feedback  loops.  These  requirements  are  met  both  by  planar  and 
by  stacked  microintegration.  Using  these  techniques  optical  structures  can  be  produced 
with  lithographic  precission  on  one  substrate,  thus  allowing  smart  pixel  arrays  to  be 
aligned  to  the  optical  system. 

One  task  of  the  optical  system  is  the  imaging  between  smart  pixel  arrays.  Therefore 
light  has  to  propagate  also  parallel  to  the  substrate  surface.  This  can  be  achieved  by 
including  beam  splitters  into  the  optical  system  [2].  In  this  case  the  system  has  to 
consist  of  several  stacked  layers  with  optical  components.  Using  the  planar  integration 
approach,  off-axis  imaging  is  used  to  obtain  lateral  light  propagation.  In  this  case  the 
optical  system  may  consist  of  only  one  single  substrate  layer  [1]  [3].  The  effects  involved 
in  using  this  off-axis  imaging  approach  are  analysed  in  this  paper  for  various  imaging 
configurations.  We  will  show  that  astigmatism  can  be  removed  by  using  a  spherical 
refractive  lens. 


Figure  1.  a)  Off-axis  microsystem  b)  Unfolded  setup 
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Model  for  the  analysis 

A  general  ofF-axis  imaging  system  is  shown  in  fig.  la.  A  smart  pixel  array  of  width  w  is 
placed  on  a  substrate  with  a  thickness  t.  A  lenslet  on  the  opposite  substrate  side  of  the 
substrate  images  the  device  onto  the  surface  of  a  second  device.  In  this  4/  configuration 
the  resolution  becomes  maximal  with  diffraction  limited  optics  as  was  shown  in  [2].  The 
mirror  on  top  of  an  aperture  layer  of  thickness  d  is  used  to  reflect  the  light  back  to  the 
lenslet. 

The  equivalent  unfolded  system  is  shown  in  fig.  lb.  It  is  obvious  that  the  aberrations 
depend  on  the  geometry  of  the  setup.  The  typical  properties  of  microintegrated  systems 
allow  a  classification  with  respect  to  different  lens  types. 


Classification  of  microlenses 

The  shape  of  the  lens  used  for  the  imaging  is  strongly  influences  the  imaging  properties. 
Assuming  a  microintegrated  system,  the  lenses  normally  consist  of  one  surface  with  a 
significant  refractive  index  difference.  Three  typical  configurations  are  shown  in  fig.  2. 
Fig.  2a  shows  a  diffractive  lens  which  is  nearly  a  flat  element.  A  refractive  surface  relief 
lens  with  an  air  spacer  before  the  mirror  is  shown  in  fig.  2b  and  in  fig.  2c  a  refractive  ion 
exchange  lens  produced  inside  the  substate  is  depicted. 


Figure  2.  Microlens  types;  a)  diffractive  b)  surface  relief  c)  ion  exchange 
lenses 


Ray— tracing  analysis 

A  ray-tracing  model  covering  these  different  lens  types  is  used  to  analyze  the  typical 
properties  of  these  lenses.  A  refractive  surface  with  center  curvature  c  is  considered  in 
a  distance  d  from  an  aperture  stop.  The  on-axis  focal  lenght  of  a  single  surface  is  given 
f  ~  (n2-nne  ’  where  nl  and  n2  denote  the  refractive  indices  in  front  of  and  behind 
the  surface.  To  calculate  the  imaging  properties  of  a  diffractive  flat  lens  with  a  mirror 
coating  the  configuration  d  =  0,  n2  =  — nl  =  n  and  c  >  0  (positive  lens)  is  assumed, 
n  denotes  the  substrates  index.  For  the  diffractive  surface  relief  lens  d  =  0  and  nl  =  1 
and  n2  =  n  is  assumed.  Refractive  field  assisted  ion  exchange  lenses  are  known  to  have 
a  narrow  region  where  the  refractive  index  change  occurs.  This  is  approximated  by  a 
refractive  sperical  surface  with  a  change  of  the  refractive  index  of  An  =  0.1.  The  stop 
distance  is  therefore  d  =  -. 
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Astigmatism  and  curvature  of  field 

Using  these  assumptions,  the  astigmatism  and  curvature  of  field  was  calculated  for  the 
three  lens  types.  A  plot  of  the  focal  lenth  as  a  function  of  the  incident  angle  u  is  shown  in 
fig.  3.  For  the  refractive  ion  exchange  lens  no  astigmatism  and  coma  occurs  and  because 
of  the  symmetry.  This  configuration  was  investigated  experimentally. 
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Figure  3.  Angular  dependency  of  the  normalized  focal  length 
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Figure  4.  Experiment:  a)  Setup  b)  Object  c)  Image  of  the  object 


The  ofF-axis  imaging  setup  is  shown  in  fig.  4a.  It  consists  of  an  ion  exchanged 
microlens,  with  spherical  refractive  index  distribution  (/  =  2.4m?7!.  in  glass),  an  aperture 
stop  {D  =  200jum)  and  a  mirror.  A  mask  is  projected  in  the  front  focal  plane  of  the 
microlens.  This  object  is  imaged  by  the  microlens  back  into  the  same  plane.  The 
projected  mask  and  its  image,  observed  with  a  microscope  is  shown  in  fig.  4.  The  image 
with  dimensions  of  AdOfim  •  SOOfim  is  free  of  coma  and  astigmatism  as  expected  from  the 
theoretical  analysis.  The  defocus  caused  by  the  remaining  curvature  of  field  is  smaller 
than  the  diffraction  spread. 

Further  aberration  analysis  including  coma  and  spherical  aberration  and  possibili¬ 
ties  for  its  compensation  will  be  presented. 
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Optical  memory  systems  utilizing  volume  holographic  storage  (VHS)  have  received  considerable  attention  in  recent 
years  [1-3],  One  aspect  of  VHS  is  the  ability  to  copy  stored  data  from  an  archive  (a  VHS  medium  containing  N 
permanently-stored  data  sets)  to  a  secondary  storage  medium  (SSM).  To  achieve  the  copying  of  volume  holographic 
memories  from  the  archive  to  the  SSM,  three  primary  approaches  can  be  utilized:  parallel  [^^,  incremental  and 
sequential  [^].  In  the  parallel  approach,  there  exist  N  mutually  incoherent  read  beams  that  enter  the  archive  and  read 
out  and  copy  all  N  stored  data  sets  simultaneously.  In  the  incremental  approach,  there  exists  a  single  coherent  read 
beam  that  is  rapidly  multiplexed  over  time  (in  angle,  wavelength,  phase,  et  cetera)  in  a  manner  such  that  it  reads  out 
and  copies  all  N  data  sets  serially,  over  many  repeated  iterations  in  which  each  iteration  time  is  short  compared  to 
the  SSM’s  response  time.  In  the  sequential  approach,  there  also  exists  a  single  coherent  read  beam  that  is 
multiplexed  over  time  (in  angle,  wavelength,  phase,  et  cetera)  but  such  that  it  reads  out  and  copies  the  N  data  sets 
sequentially  in  a  manner  that  follows  an  appropriate  (single-pass)  exposure  schedule.  In  this  paper,  we  explore  the 
fundamental  material  limitations  in  each  of  these  copying  approaches  for  the  cases  of  all-optical,  quasi  all-optical, 
and  hybrid  opto-electronic  copying  schemes.  These  limitations  include  the  maximum  allowable  intensity,  Ia,  into 
the  archive  that  will  not  damage  it,  the  resulting  total  intensity,  Ip),  of  the  diffracted  beam  from  the  archive  as  a 
function  of  N,  Ia,  and  other  archival  material  and  geometry  constraints,  dark  conductivity,  shot  and  Kerr  noise  in  the 
SSM,  cross-talk,  and  the  characteristics  of  any  devices  that  may  be  placed  between  the  archive  and  the  SSM. 

We  begin  by  noting  the  relations  for  the  optical  writing  of  N  holograms  into  a  storage  medium  to  a  desired 
index  modulation  depth  per  hologram  of  An(tw)  in  a  total  write  time  tw  for  all  three  approaches  mentioned  above. 
These  can  be  expressed  (for  large  N)  as 

=  (1)  r,,  = -Tln(l  -  ct)  (2) 


where  An^at  is  the  saturation  index  modulation  for  the  writing  of  a  single  hologram,  a  is  a  number  (less  than  but 
close  to  one)  that  defines  an  acceptable  fraction  of  achievable  index  modulation  attained  during  a  total  write  time  t^. 
and  X  is  the  storage  medium’s  response  time  (intensity  dependent). 

When  attempting  to  copy  holograms  from  an  archive  to  a  SSM,  one  must  consider  the  realistic  limitations 
of  these  storage  media.  For  the  archive,  the  primary  concern  is  its  ability  to  dissipate  heat  absorbed  from  the  read 
beam(s)  during  the  recall  process.  For  archival  media  such  as  iron-doped  lithium  niobate,  absorption  effects 
typically  limit  I  a  to  less  than  -10  W/cm^.  As  well,  a  secondary  concern  will  be  the  effect  of  the  read  out 
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illumination  to  gradually  erase  the  archive’s  memory  (even  for  “fixed”  holograms).  Higher  read  out  intensities  will 
shorten  the  archive’s  lifetime  and  therefore  limit  the  number  of  copies  that  can  be  made  from  it. 

In  transferring  data  from  the  archive  to  the  SSM,  one  must  then  consider  the  total  intensity  of  the  diffracted 
light.  Id,  exiting  the  archive.  The  equation  for  this  value  is  generally  dominated  by  a  l/N^  dependence  such  that 

h-h(nJN^)  (3) 


where  rio  is  the  optimal  diffraction  efficiency  attained  for  the  archival  storage  of  a  single  hologram.  One  drawback 
of  archiving  stored  data  sets  is  that  the  archiving  (fixing)  process  reduces  the  diffraction  efficiency  of  the  stored 
holograms  For  holograms  initially  stored  in  LiNbOa  at  room  temperature  and  then  thermally  fixed,  r|o  <10-1 

The  ability  to  copy  data  sets  to  the  SSM  then  becomes  further  limited  by  the  material  parameters  of  the 
SSM.  In  terms  of  the  writing/copying  response  time,  x,  Yeh  has  shown  there  to  be  a  fundamental  limit  for 
photoreffactive  media  that  is  related  to  the  total  incident  intensity.  The  total  intensity  into  the  SSM  can  be  expressed 
as  IsSM  =  Ir  +  GId.  where  Ir  is  the  SSM’s  reference  beam  intensity  and  G  is  any  optical  gain  that  may  exist 
between  the  archive  and  the  SSM.  The  photorefractive  response  time,  X,  is  therefore  fundamentally  limited  to  be 
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T  = 


V  hsM  J 


1  Y— 

e  Aa A^^>' 
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nr 


(4) 


where  (hv/e)  is  the  incident  energy  per  photon,  A  is  the  optical  wavelength,  A  is  the  grating  spacing,  F  is  the 
material’s  coupling  constant,  ttg  is  its  optical  absorption  coefficient,  ri  is  its  quantum  efficiency,  e  is  its  permitivity,  n 
is  its  background  index  of  refraction,  and  r  is  its  effective  electro-optic  coefficient.  For  LiNb03:Fe,  tIssm  ^ 
[WS/cm^].  Another  limiting  material  parameter,  dark  conductivity,  becomes  a  serious  issue  when  there  are 
approximately  as  many  charge  carriers  photoexcited  by  the  incident  intensity  as  there  are  thermally  excited.  In  such 
a  case,  the  dark  conductivity,  Cd,  approximately  equals  the  photoconductivity,  Op,  where 

G,  =  csX'^tyi^{-E^lk^T)  (5)  =  (6) 

in  which  cTq  is  a  crystal  dependent  constant,  T  is  the  temperature,  Ep  is  the  charge  carrier  s  activation  energy,  ks  is 
Boltzmann’s  constant,  and  s  is  the  photoionization  cross  section.  As  well,  it  has  been  shown  that  reducing  Gd 
increases  T.  There  are  also  effects  due  to  shot,  photorefractive,  and  Kerr  noises  to  consider  which  limit  the  dynamic 
range  of  the  storage  medium  utilized  When  one  or  both  of  the  interfering  write  beams  is  of  significantly  low 
intensity,  shot  noises  (due  to  quantum  fluctuations  in  the  total  number  of  interacting  photons),  and  photorefractive 
thermal  noise  (as  in  the  case  of  dark  conductivity  mentioned  above)  may  become  prevalant,  limiting  factors  in  a 
material.  As  well,  Kerr  noise,  a  type  of  thermal  noise  that  is  responsible  for  random  fluctuations  of  the  optical- 
frequency  dielectric  permittivity  arising  through  the  optical  Kerr  effect  may  become  important.  In  recent  studies  of 
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ferroelectrics  ^1,  Chang  et  al  found  that  Kerr  noise  was  the  dominant  noise  source  of  the  three.  However,  their 
calculated  fundamental  limits  due  to  these  three  noise  sources  gave  values  three  or  more  orders  of  magnitude  below 
their  system’s  detectability  limits.  Furthermore,  cross-talk  noises  arising  and  enhanced  during  the  writing  process  in 
both  the  archive  and  the  SSM  may  also  limit  N  by  decreasing  the  signal-to-noise  ratio  (SNR)  in  either 

Depending  on  system  geometries,  any  of  these  effects  may  fundamentally  limit  the  copying  process.  In  an 
all-optical  copying  scheme,  in  the  parallel  and  incremental  approaches  (G  limited  to  ~  1),  to  copy  N  =  10,000 
holograms  to  a  =  90%  would  require  a  total  time  t^  ~  10^  sec.,  or  about  3.2  years.  If  the  sequential  approach  is 
utilized,  then  G  >  1  is  possible  (i.e.,  two-wave  mixing,  material  limited  to  Itq  ~  10'^  WS/cir^.  and  with  GIp)  -- 
10  mW/cm^  (G  ~  10^),  N  =  10,000  and  a  =  90%,  t^  ~  100  sec.  In  a  quasi  all-optical  scheme,  we  allow  for  an 
optically  addressed  spatial  light  modulator  (OASLM)  to  exist  between  the  archive  and  the  SSM.  Utilizing  these 
devices,  only  the  sequential  approach  is  practical.  We  show  that  even  for  the  best  OASLM  reported  (an  a-Si:H  FLC, 
Sm  C*)  resolution,  contrast  and  sensitivity  limitations  require  N  -  200  for  SNR  ~  2  in  the  OASLM.  With 

GId  ~  100  mW/cm^,  the  total  total  copying  time  (for  a  =  90%)  would  be  t^  ~  5  sec.  These  numbers  may  improve 
as  OASLMs  improve.  Finally,  in  a  hybrid  opto-electronic  scheme,  we  allow  for  a  CCD  to  detect  Id,  and  its  output 
in  turn  drives  an  LCTV  or  similar  type  device.  As  in  the  quasi  all-optical  scheme,  only  the  sequential  approach  is 
practical  in  the  hybrid  scheme.  For  SNR  >  2,  modern  CCDs  require  Id  <  10"^^  W/cm^  at  100  Hz  frame  rates 
This  allows  for  N  >  250,000.  If  such  a  system  operates  at  1  kHz  frame  rates  @  N  =  10,000  (assuming  -1  GHz  clock 
frequencies,  10^  pixels,  and  GId  =  5  W/cm^)  then  copying  to  a  =  90%  would  require  t^  10  sec.  As  well, 
thresholding  between  the  CCD  and  LCTV  can  allow  for  a  significant  increase  in  the  copied  holograms’  SNR. 

We  have  theoretically  investigated  fundamental  limitations  in  volume  holographic  copying.  Our  results 
indicate  that  opto-electronic  sequential  copying  is  superior  to  all  other  approach  combinations.  The  sequential 
scheme  is  superior  in  all  three  approaches;  the  incremental  scheme  is  possible  in  all  three  approaches  but  is 
impractical  due  to  severe  update  delays  (>10  times  as  many  as  in  the  sequential  scheme);  the  parallel  scheme  is  only 
possible  in  the  all-optical  approach,  but  is  very  impractical  because  it  is  excessively  slow  for  respectable  N. 
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1.  Introduction 

Scientific  computation  typically  involves  processing  large  amounts  of  data  in  which  operations  using  main 
memory  and  mass  storage  are  frequent.  A  performance  bottleneck  to  and  from  main  memory  arises  because 
only  a  small  number  of  input  and  output  pins  are  provided  on  electronic  memory  integrated  circuits,  which 
means  that  large  portions  of  the  memory  cannot  be  accessed  in  parallel.  This  bottleneck  is  compounded  by 
optical  memories,  in  which  entire  pages  can  be  brought  into  a  system  in  parallel.  Here,  we  describe  a  con¬ 
cept  architecture  that  improves  the  access  time  to  a  parallel  memory  through  the  use  of  an  optical  interface. 

Figure  la  illustrates  the  model  for  the  optical  memory  interface.  At  the  lowest  level,  data  objects  of  various 
sizes  are  stored  in  an  optical  recording  medium.  Rectangular  areas  are  illuminated  in  parallel,  and  the  read¬ 
out  beams  from  the  storage  plane  are  redirected  to  a  staging  plane  where  data  objects  tile  a  regularly  shaped 
area.  In  Figure  la,  three  rectangularly  shaped  data  objects  that  are  arbitrarily  placed  in  the  storage  plane  are 
imaged  onto  the  staging  plane  so  that  they  fit  tightly  within  a  single  rectangle. 

After  the  data  objects  are  organized  in  the  staging  plane,  they  are  distributed  in  parallel  in  the  optical  distri¬ 
bution  plane  to  a  host  system  through  a  parallel  read/write  window.  The  reverse  process  is  used  when  writ¬ 
ing  in  parallel.  The  beam  redirection  can  be  performed  with  a  beam-blocking  approach  [1],  in  which  the 
beams  are  fanned  to  a  number  of  locations,  and  selective  blocking  controls  the  target  locations  of  the  beams. 
In  an  alternative  low  latency  approach,  the  redirection  can  be  performed  with  beam-steering  [2]. 

An  advantage  of  this  memory  organization  is  that  once  the  beam-blocking/beam-steering  mechanism  is 
configured,  parallel  reads  and  writes  can  be  made  indefinitely  without  incurring  the  overhead  of  reconfigu¬ 
ration.  Parallel  memory  traffic  patterns  repeat  [3],  and  so  even  a  slow  reconfiguration  mechanism  can  be 
effective  if  a  high  bit  rate  is  maintained  after  reconfiguration. 


2.  The  Model 

Figure  lb  shows  the  external  view  of  the  parallel  memory  interface.  The  ADDRESS  port  is  used  for  a  logical 
address  that  is  internally  mapped  to  a  physical  location.  The  correspondence  between  logical  and  physical 


Figure  1:  (a)  Model  for  the  reconfigurable  optical  memory  interface;  (b)  external  view. 
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addresses  may  change  during  operation.  For  parallel  addressing,  the  ADDRESS  port  holds  the  starting  ad¬ 
dress  of  a  block,  and  the  SIZE  port  holds  the  size  of  the  block  that  is  being  accessed,  excluding  the  first 
address.  Thus,  to  access  the  first  four  logical  addresses  in  the  memory  in  parallel,  the  ADDRESS  port  should 
be  0,  and  the  value  on  the  SIZE  port  should  be  3. 

The  DATA-IN  and  DATA-OUT  ports  transfer  scalar  (single  word)  data  between  the  memory  and  an  elec¬ 
tronic  host.  The  OPTICAL  DATA-IN  and  OPTICAL  DATA-OUT  ports  transfer  block  data  between  the 
memory  and  an  optical  storage  device.  The  value  at  the  MODE  port  can  take  on  one  of  six  values: 

0  (Read)  or  1  (Write):  Perform  a  scalar  Read  or  Write  on  the  memory  location  at  the  ADDRESS  port. 

2  (Parallel  Read)  and  3  (Parallel  Write):  Perform  a  block  Read  or  Write.  The  block  appears  at  the  OPTICAL 
DATA-OUT  port  or  is  read  from  the  OPTICAL  DATA-IN  port  as  appropriate  for  the  operation. 

4  (Internal  Parallel  Copy  -  IPC):  Copy  a  block  of  words  from  one  section  of  memory  to  another. 


5  (Contiguize):  Memory  locations  are  moved  so  that  they  are  physically  contiguous  (that  is,  they  fill  a 
block),  while  retaining  their  logical  addresses.  This  makes  subsequent  parallel  reads,  parallel  writes,  and 
IPCs  on  arbitrarily  shaped  data  objects  more  efficient.  When  the  Contiguize  mode  is  maintained,  the  memory 
treats  every  new  address  as  an  addition  to  the  object.  When  the  MODE  field  changes,  the  object  is  then 
accessible  in  parallel.  This  function  is  useful,  for  example,  when  accessing  a  sparse  matrix  in  its  entirety. 

Data  objects  are  reshaped  during  operation  to  conform  to  a  simple  tree  structure,  as  shown  in  Figure  2.  The 
N  word  memory  (N  =  16  for  this  example)  is  fanned  in  through  a  log2iV  tree  cascade  to  extract  an  entire 
subtree  of  words  (1, 2, 4, 8,  or  16  for  this  example).  The  extracted  object  is  then  distributed  through  a  fan-out 
cascade.  At  each  stage,  data  objects  are  either  combined  or  split  apart,  or  are  sent  straight  through  without 
any  fan-in  or  fan-out,  as  directed  by  the  Parallel  Decoder.  In  this  way,  arbitrary  data  objects  can  be  selected 
that  are  an  integral  power  of  two  in  size,  that  fall  on  integral  power  of  two  boundaries  in  the  address  space  of 
the  memory.  Arbitrarily  shaped  objects  placed  at  arbitrary  boundaries  in  the  memory,  however,  cannot  be 
directly  manipulated.  We  address  this  problem  in  the  next  section  through  a  series  of  parallel  accesses. 

3.  Reshaping  Memory  for  Efficient  Parallel  Access 

A  four  level  deep  decoder  tree  for  a  16- word  memory  is  shown  in  Figure  3.  As  an  example  of  how  the 
decoder  tree  works,  the  address  101 1  is  presented  at  the  root  node  (the  top  level  of  the  tree).  The  leftmost  bit 
in  the  address  is  a  1  so  the  right  path  is  traversed  at  Level  0  as  indicated  by  the  arrow.  The  next  bit  is  a  0  so 
the  left  path  is  traversed  at  Level  1 .  The  next  bit  is  a  1  so  the  right  path  is  traversed  at  Level  2,  and  the  last  bit 
is  a  1  so  the  rightmost  path  is  traversed  and  the  addressed  leaf  1011  is  reached  at  Level  3. 


We  introduce  the  use  of  dual-rail  logic,  in  which  a  logical  0  is  represented  by  the  spatial  pair  0-1  (dark-light) 


Figure  2:  Model  of 
parallel  accessing 
structure.  i 


Fan-In  Cascade 


Fan-Out  Cascade 
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and  a  logical  1  is  represented  by  the  spatial  pair 
1-0  (light-dark).  The  1011  single-rail  address  be¬ 
comes  1-0  0-1  1-0  1-0  in  dual-rail  logic.  If  we 
allow  both  sides  of  a  dual-rail  decoder  tree  to  be 
traversed  simultaneously,  by  forcing  both  bits  of 
a  dual-rail  bit-pair  to  be  1,  then  the  size  of  the 
accessed  data  object  doubles.  This  is  an  impor¬ 
tant  aspect  of  the  memory  addressing  scheme. 

Figure  3  shows  a  decoding  path  when  both  bits 
of  the  two  rightmost  dual-rail  bit-pairs  are  set  to 
1.  The  four  words  at  locations  1000-1011  are  ac¬ 
cessed  in  parallel.  This  addressing  scheme  thus 
offers  a  potential  for  accessing  a  parallel  memory  in  a  useful  way,  rather  than  simply  sending  a  raw  data 
block  to  a  host  processor  that  would  then  be  forced  to  reformat  it. 

The  IPC  Algorithm  shown  below  decomposes  an  arbitrarily  shaped  region  into  the  minimum  number  of 
subregions  that  are  accessed  in  succession,  making  use  of  the  dual-rail  addressing  scheme.  The  IPC  Algo¬ 
rithm  starts  by  assuming  a  block  of  words  to  be  accessed  fits  exactly  into  a  power  of  two  subtree,  taking  into 
account  the  positions  of  the  boundaries.  It  then  successively  decomposes  the  block  until  the  sub-blocks  fit 
within  the  boundaries.  Adjacency  is  considered  for  Cartesian  space  only,  as  shown  in  Figure  3  for  a  four- 
word  block,  and  not  for  Hamming  space  which  can  be  more  efficient. 

TPC  Algorithm 

The  ADDRESS  port  holds  the  Source  address  encoded  in  dual-rail  logic. 

The  DATA-IN  port  holds  the  Target  address  encoded  in  dual-rail  logic. 

The  SIZE  port  holds  the  dual-rail  size  of  the  block  of  memory  to  copy. 

Function  FillsSubtree (Address .  Size)  returns  TRUE  if  the  block  at  Address  ex¬ 
actly  fills  a  power-of-2  subtree  on  a  Size  boundary;  returns  FALSE  otherwise. 

Temp  Size 

LOOP:  If  FillsSubtree (Source,  Temp)  AND  FillsSubtree(Target ,  Temp) 

{  //  Read  a  block  from  the  memory  using  a 

Parallel_Read( Source  XOR  Temp/ 2) ;  II  "disallowed"  dual-rail  address. 
Parallel_Write ( Target  XOR  Tempi 2) ;  II  Write  the  block  back  to  Target. 
Source  <-  Source  +  Temp  +  1; 

Target  <-  Target  +  Temp  +  1; 

Size  <—  Size  -  (Temp  +  1)  ;  Temp  Size; 

}  Else  Temp  <-  Temp  /  2; 

If  Size  ^  0  then  GOTO  LOOP;  Else  DONE. 

The  work  reported  here  was  jointly  supported  by  AFOSR  and  NSF  on  grant  ECS  93-12625. 
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A  potential  advantage  of  communicating  information  optically  rather  than  electrically  is 
the  ease  with  which  non-local  interconnection  paths  may  be  set-up.  The  radix  2  is  a  general 
purpose  interconnection  pattern  of  potential  use  in  optical  computing  architectures.  A  2- 
dimensional  array  of  nodes  is  interconnected  to  nearest  neighbours,  neighbours  2  pixels  distant, 
4  pixels  away,  etc  in  both  the  vertical  and  horizontal  directions.  Each  node  may  be  a  processing 
cell  implemented  using  smart  pixel  technology.  This  interconnection  scheme  minimises  the 
number  of  cycles  required  to  solve  algorithms  involving  recursive  doubling  such  as  the  bitonic 
sort  and  fast  Fourier  transforms. 

This  interconnection  pattern  is  most  simply  implemented  using  a  fixed  phase  grating 
which  provides  the  fan-out  and  a  large  detector  or  an  array  of  small  detectors  which  provide  the 
fan-in.  Such  a  fixed  fan-out  scheme  suffers  from  a  considerable  power  loss  penalty.  In  addition, 
the  capacitance  of  the  detectors  would  lead  to  a  further  power  penalty  unless  the  system 
bandwidth  is  compromised.  Power  loss  is  important  as  the  available  laser  power  typically  limits 
the  interconnection  bandwidth  and  high  power  lasers  are  unavailable,  expensive  or/and 
unreliable.  To  overcome  these  drawbacks,  it  is  interesting  to  consider  the  use  of  a 
reconfigurable  interconnect  and  a  single  detector  to  minimize  the  power  required. 

The  only  reconfigurable  interconnect  technology  available  to  us  is  a  phase-only  nematic 
liquid  crystal  device(SLM).  The  pattern  displayed  (figure  1)  on  this  modified  Sieko-Epson 
projection  display  can  be  reconfigured  at  30  Hz.  A  dynamic  interconnect  is  only  of  interest  if  it 
can  be  reconfigured  at  a  rate  comparable  to  the  system  clock  fi’equency.  However,  the 
experiments  we  are  carrying  out  are  of  interest  in  investigating  generic  issues  such  as 
alignability,  loss  and  crosstalk.  As  these  are  understood,  the  characteristics  of  a  viable  device 
based  on  phase  modulation  by  a  semiconductor  SLM  will  be  more  fully  appreciated. 

The  system  constmcted  is  shown  in  figure  2.  A  photograph  of  the  system  is  shown  as 
figure  3.  It  consists  of  a  high-power  single-stripe  diode  laser  (850  nm)  with  an  external 
diffraction  grating  providing  wavelength  stability  and  selection[l],  a  cascade  of  50/50  beam 
splitters  to  provide  two  sets  of  2  beams(which  will  illuminate  the  S  and  R  windows  of  a 
symmetric-SEED)  propagating  at  slightly  different  angles  when  incident  on  a  binary  phase 
grating.  The  BPG  splits  the  two  beams  into  an  array  of  8  x  8  sets  of  2  beams.  Each  array  of 
128  beams  is  transmitted  by  a  PBS/QWP  and  a  42  mm  lens  before  being  incident  on  an  array 
of  S-SEEDs.  The  reflected  light  is  incident  on  the  other  S-SEED  array  after  passing  through 
the  reconfigurable  interconnect.  All  of  the  components  are  mounted  on  a  steel  slot-plate 
providing  an  extremely  rigid,  compact  and  inexpensive  baseplate. 

The  channel  pitch  in  the  S-SEED  plane  is  160  |im.  The  separation  between  the  two 
beams  incident  on  each  5  |im  by  10  |i.m  window  of  the  S-SEED  is  20  |j.m.  S-SEEDs  were  used 
as  no  smart  pixels  were  available.  In  a  future  system  under  constmction,  it  is  intended  that  the 
pitch  be  increased  to  >200  |xm  and  that  the  devices  be  replaced  with  InGaAs/CMOS  smart 
pixels  that  perform  an  exchange  bypass  operation  on  the  incident  data. 

The  pitch  of  the  pixels  in  the  SLM  is  56  |im  by  46  |i.m  in  the  vertical  and  horizontal 
directions  respectively.  To  be  practical  in  a  system,  it  is  desirable  that  the  SLM  pixel  pitch  be 
reduced  considerably.  The  SLM  is  an  analog  device  capable  of  a  phase  change  of  >k/2  at  850 
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nm.  A  grating  phase  depth  of  7t/2  produced  a  fan-out  of  2  of  81  %  of  the  input  power  to  the 
&st  order  beams.  A  phase  depth  of  7i/3  produces  a  fan-out  of  three  with  equal  power  going 
into  the  zero  order  and  first  order  beams. 

The  lenses  used  were  hybrid  combinations  of  a  conventional  42  mm  focal  length  HA 
triplet  custom-designed  for  this  experiment  and  a  compound  afocal  microlens  3:1  telescope 
formed  from  three  microlens  arrays  glued  together.  This  lens  is  used  to  provide  a  fjl.l  lens 
with  a  large  field  of  view.  The  lens  performance  has  been  measured  and  it  provides  spots  of  <5 
^im  diameter  over  a  field  5.2  mm  by  5.2  mm[2].  An  improved  version  of  this  lens  is  currently 
being  developed  in  our  laboratory  which  uses  a  4-element  HA  lens  with  a  field  of  1 1  mm  square 
in  conjunction  with  a  3:1  microlens  telescope,  in  this  experiment,  the  hybrid  lens  we  designed 
worked  satisfactorily.  The  afocal  telescope  is  required  to  accommodate  the  dual-r^  operation 
of  the  S-SEEDs.  The  lens  could  be  considerably  simplified  if  a  single-rail  transceiver  is  used. 
Recent  improvements  in  SEED  performance  indicate  that  single-rail  operation  will  be  used  in 
subsequent  systems.  In  that  case,  a  singlet  microlens  can  be  used  as  a  concentrator  to  increase 
the  power  of  the  slow  conventional  lens. 

The  choice  of  channel  pitch(160  |J.m)  is  linked  to  the  SLM  pixel  pitch(46  nm  and  56  |4. 
m).  The  period(P)  of  the  grating  on  the  SLM  must  be  an  even  integer  number(n)  of  SLM 
pixels,  n  must  be  even  since  each  period  consists  of  one  pixel  on  (ji/2  phase  depth)  and  the  next 
pixel  off  (zero  phase  depth).  In  addition,  this  period  must  satisfy  the  grating  equation:  P=2X,f/S, 
where  f  is  the  focal  length  of  the  lens(42  mm)  and  S  is  the  spot  separation  produced  by  the 
grating  in  the  plane  of  the  lens.  S  must  be  an  integer  multiple  of  the  device  pixel  pitch(160 
m).  The  positive  and  negative  first  orders  from  the  SLM  grating  are  used.  With  the  56  lim 
SLM  pitch,  S=316  ^im  for  n=4  and  S=632  ^m  for  n=2.  The  10  ^im  len^  of  the  S-SEED 
window  accommodates  the  small  difference  between  these  values  of  S  and  integer  multiples  of 
the  device  pitch.  This  smaU  difference  is  useful  since  it  ensures  that  the  low  power  higjier 
orders  generated  by  the  SLM  grating  do  not  fall  on  the  windows. 

ft  is  clear  that  the  existing  devices  are  unsuitable  to  obtain  the  full  radix  2  interconnect, 
only  the  first  two  stages  of  it  can  be  generated,  ft  would  be  possible  to  generate  other  stages  of 
the  interconnect  if  the  SLM  pitch  is  reduced.  Liquid  crystal  devices  with  a  pixel  pitch  of  <20  [i 
m  have  been  developed. 

The  experiment  has  been  operated  in  only  a  limited  fashion  so  far  due  to  the  failure  of 
one  of  the  S-SEEDs  which  will  be  replaced,  ft  is  anticipated  that  we  will  be  able  to  present 
results  of  the  operation  of  this  system  at  the  meeting.  Measurements  of  uniformity  and  loss 
will  be  made. 

[1]  J.M.  Sasian  et  al,  "Frequency  control,  modulation,  and  packaging  of  an  SDL  (100  mW) 
laser  diode",  in  Technical  Digest  of  OSA  Topical  Meeting  on  Optical  Design  for  Photonics, 
March  22  1993, 

[2]  F.  Tooley  et  al,  “The  implementation  of  a  hybrid  lens,”  submitted  to  Applied  Optics  Oct. 
1994. 
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Figure  3:  Photograph  of  reconfigurable  interconnection  system 
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Figure  2:  Schematic  of  the  reconfigurable  interconnection  system 
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1.  Introduction 


The  use  of  optical  interconnects  for  backplane  and  bus  applications  in  multiboard  processor 
systems  has  been  considered  by  a  number  of  investigators.^’^  However,  most  of  these  analyses 
have  not  evaluated  the  impact  of  optics  on  specific  bus  configurations.  In  this  presentation  we 
evaluate  the  performance  of  optics  in  a  Futurebus+  backplane,  and  show  that  considerable 
reduction  in  data  transfer  delay  can  be  obtained  using  existing  electro-optic  interface  elements  in  an 
optical  backplane.  The  significance  of  this  result  is  that  a  substantial  improvement  of  existing  bus 
architectures  can  be  achieved  using  a  straight  forward  substitution  of  electro-optic  components  for 
electrical  transceivers. 

2.  Asynchronous  Data  Transfer  Protocol  in  the  Futurebus+  Backplane 

Futurebus-i-3  is  a  revised  and  extended  version  of  the  original  Futurebus  standard  (IEEE  896.1 
-1987).  It  is  a  high  performance  and  industry  standard  backplane  specification  for  multiprocessor 
system  and  I/O  buses.  Futurebus-i-  can  support  a  maximum  256  bit  data  path,  and  a  maximum' 
transfer  rate  of  3.2  GBytes/s.^  A  typical  protocol  of  data  transfer  for  Futurebus+  is  analyzed  to 
show  the  delay  associated  with  the  data  transfer.  Figure  1  shows  one  cycle  of  data  transfer  in 
compelled  mode'^  between  sender  and  receiver  modules,  and  illustrates  the  basic  handshaking 
between  two  modules. 


S:  SENDER  MODULE 
R:  RECEIVER  MODULE 
1:  TRANSMITTER  INPUT  AT  S 
2 :  TRANSMITTER  OUTPUT  AT  S 
3:  RECEIVER  INPUT  AT  R 
4:  RECEIVER  OUTPUT  AT  R 
5:  TRANSMITTER  OUTPUT  AT  R 
6:  RECEIVER  INPUT  AT  S 
7:  RECEIVER  OUTPUT  AT  S 
NEW  CYCLE  BEGINS 
A,B,C,D,E,F: 

DELAYS  INDUCED  AT  EACH  STAGE 


All  the  bus  transmitters  and  receivers  are  specified  with  a  maximum  and  a  minimum  switching 
delays.  Since  the  sender  module  must  guarantee  that  valid  data  is  on  the  bus  prior  to  changing 
SYNC  signal,  it  must  wait  for  the  amount  of  time  equal  to  the  maximum  delay  of  its  transmitters(A 
and  D  in  Fig.l).  The  same  situation  happens  on  the  receiver  module,  which  must  wait  for  the 
maximum  delay  of  its  receivers(C  in  Fig.l).  As  a  result,  the  difference  in  delays  introduced  by 
each  transceiver,  or  skew,  is  more  important  than  the  absolute  value  of  delay  to  the  overall 
performance. 

3.  Data  Transfer  Time  in  the  Futurebus+  Backplane 

The  electrical  line  lengths  and  data  rates  on  the  Futurebus-f  backplane  (30  cm  for  a  10  board 
system  and  50-100  MHz)  require  transmission  line  analysis.  Figure  2  shows  a  strip  line 
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configuration^  for  the  electrical  backplane  model,  and  is  used  to  determine  the  propagation  delay 
along  the  backplane. 


GROUND 

dielectric 

^  PLANES 

SUBSTRATE 

Fig.2.  Cross  section  of  the  backplane  for  one  signal  line 

Transmission  lines  can  be  specified  by  a  characteristic  impedance  ,  and  a 

propagation  delay  per  unit  length  where  4  is  the  unloaded  distributed  inductance  per 

unit  length,  and  C„  is  the  unloaded  distributed  capacitance  per  unit  length  of  the  line.  From  the 
IEEE  specification,^  =  57  Q  (52  ^-62  Q),  and  C„  =  29  (pF/ft)  =  0.95  (pF/cm).  Using  these 
values,  the  propagation  delay  per  unit  length  of  a  backplane  without  loads  is  determined  by 
=  =54.15  (ps/cm). 

All  the  boards  connected  to  the  backplane  will  load  the  line  with  the  capacitances  associated 
with  their  connectors,  vias,  board  traces,  and  transceivers.  As  a  result,  the  load  capacitance  per 
unit  length  due  to  these  boards  will  affect  the  characteristics  of  the  unloaded  backplane.  Almost  all 
vendors  use  BTL  (Backplane  Transfer  Logic)  transceivers  to  meet  the  Futurebus+  specification. 
Since  the  IEEE  standard^  specifies  that  the  maximum  capacitance  for  a  BTL  transceiver  should  be  5 
pF,  a  reasonable  estimation  of  the  load  due  to  a  single  board  is  about  10  pF.4  When  there  are  10 
boards  connected  to  the  30  cm  backplane,  using  the  typical  separation  of  3  cm  between  boards,  the 
load  capacitance  per  unit  length  is  Q  =  3. 3  (pF  /  cm) . 

For  a  loaded  backplane  the  characteristic  impedance  ( Z^)  and  the  propagation  delay  per  unit  length 
( t^)  can  be  obtained  by  replacing  C„  with  C„  +  Q  in  the  equations  for  Z„  and  Therefore, 

Z.  =  V^.AQ  +  Q)  =  26.95  (Q) ,  and  t,  =  +  =  114.5  (ps / cm). 

Unlike  its  predecessors  made  of  TTL  interfaces,  Futurebus+  eliminates  the  need  for  settling 
time  due  to  reflections  by  satisfying  the  first  incident  switching  on  the  bus  using  the  BTL 
interface.^  Since  there  is  no  settling  time,  delay  on  the  bus  is  decided  only  by  the  propagation 
delay,  which  will  be  3.44  (ns)  for  a  30  cm  board.  This  imposes  one  of  the  fundamental  limits  on 
the  performance  of  the  Futurebus+.  In  comparison  the  time  required  to  transfer  an  optical  signal 
30  cm  in  air  is  1  ns,  and  2  ns  in  glass  (n=1.5). 

In  order  to  determine  the  period  of  time  required  to  complete  a  data  transfer,  it  will  be  necessary 
to  estimate  the  delays  introduced  by  the  electrical  and  optical  bus  interface  devices.  Electrical  bus 
interfaces  are  designed  to  activate  a  receiver  with  the  incident  edge  of  the  transmitted  signal.  The 
specifications  for  an  advanced  BTL  electrical  transceiver  interface^  are  used  to  determine  the 
performance  of  the  electrical  Futurebus+.  The  corresponding  delays(A-F  in  Fig.l)  for  an 
asynchronous  data  transfer  in  the  electrical  case  are ; 

A:  Maximum  transmitter  delay  (~  7  ns) 

B:  Propagation  delay  along  the  backplane  (~  3.5  ns) 

C:  Maximum  Receiver  delay  (=  receiver  enable  delay  ~  12  ns) 

D:  Maximum  transmitter  delay  (~  7  ns) 

E:  Propagation  delay  along  the  backplane  (~  3.5  ns) 

F:  Typical  Receiver  delay  ( ~  5  ns) 

This  results  in  a  total  delay(A+B+C+D+E+F)  of  38  ns.  It  is  closely  matched  with  the 
performance  prediction  of  the  Futurebus+.3  The  optical  data  transfer  time  is  calculated  using  the 
specifications  for  the  OETC(Opto-Electronic  Technology  Consortium)  500  Mbps  32  channel*®  and 
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the  Hitachi  200  Mbps  8  channel  11  optical  transmitters  and  receivers.  In  this  case  the  delays(A-F  in 
Fig.  1)  for  an  asynchronous  data  transfer  are  ; 

A  :  Transmitter  delay  (~  1  ns)  =  laser  driver  circuit  delay  and  skew  (~  500  ps) 

+  average  laser  turn  on  delay  (less  than  250  ps  m)  +  laser  turn  on  skew  /  2  (~  300  ps  1 1) 
cf.)  At  data  rate  of  500  Mbps,  maximum  transmitter  delay  should  be  less  than  2  ns. 

B  :  Propagation  delay  in  free  space  (30  cm ;  ~  1  ns) 

C  :  Receiver  delay  ( ~  2  ns  lO)  =  average  photodiode  circuit  delay  (~  1.3  ns) 

+  skew/2  (150  ps)  +  decision  level  variation  skew  (350  ps) 
cf.)  At  data  rate  of  5(X)  Mbps,  maximum  receiver  delay  should  be  less  than  2  ns. 

D:  transmitter  delay  (1  ns) 

E:  propagation  delay  in  free  space  (~  1  ns) 

F:  receiver  delay  (~  2  ns) 

This  results  in  a  total  delay  of  8  ns  for  free  space  and  10  ns  for  glass  waveguide  backplane, 
which  is  an  improvement  of  3.8  to  4.75  times. 

Conclusions 

The  evaluation  of  Futurebus+  propagation  delay  characteristics  shows  that  a  simple 
replacement  of  electrical  line  drivers,  lines,  and  receivers  with  existing  electro-optic  components  in 
an  optical  backplane  can  provide  substantial  improvement  to  the  bus  performance.  This  results 
from  two  characteristics  of  the  Futurebus+  architecture  and  hardware.  First,  the  handshaking 
protocol  doubles  the  effects  of  propagation  and  transceiver  delays  on  transmission  bandwidth.' 
Second,  the  fundamentally  different  loading  characteristics  of  the  optical  backplane  paths  which  are 
not  a  function  of  line  loading  also  reduce  transceiver  device  delays. 

In  the  remainder  of  this  presentation  we  discuss  the  power,  noise,  and  bandwidth 
characteristics  of  the  most  competitive  electrical  interconnect  technologies  (BTL,  GTL,  and  ETL) 
and  compare  their  operation  to  demonstrated  eletro-optic  interface  components.  We  then  use  these 
results  as  a  guide  for  designing  the  optical  elements  of  two  different  optical  backplane 
configurations. 
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The  construction  of  a  programmable  multilayer  analogue  neural  network  using  space 

invariant  interconnects 

N.  Ceilings,  A.R.  Pourzand,  R.  Vdlkel, 

Institute  of  Microtechnology,  University  of  Neuchatel,  Switzerland. 

Introduction 

The  use  of  a  multilayer  neural  network  is  indicated  in  those  cases  of  pattern  classification 
where  the  input  has  a  relatively  low  spatial  complexity,  eg  16  x  16  pixels.  Such  an  input 
size  arises  in  the  post-segmentation  stage  of  handwritten  character  recognition,  or  more 
generally  after  a  pre-processing  stage  on  more  complex  input  scenes.  Since  the  pre¬ 
processing  is  likely  to  be  optical,  eg  Fourier  or  wavelet  transform,  it  is  of  interest  to 
consider  the  construction  of  an  optical  neural  network  where  the  training  might  be  slow,  due 
to  the  speed  of  the  interface  of  the  programmable  weight  matrices,  but  the  classification 
stage  would  proceed  at  rates  superior  to  electronics.  This  involves  the  use  of  stand-alone 
analog  optical  device  for  the  intermediate  layer  of  neural  thresholding  elements  (hidden 
layer)  in  between  the  two  layers  of  interconnects.  The  critical  aspects  of  such  an  approach 
are  the  engineering  of  the  programmable  interconnect,  the  characteristics  of  the  hidden  layer 
optical  device,  the  question  of  optical  subtraction,  and  the  use  of  discretization  techniques 
to  avoid  the  deleterious  consequences  of  analog  noise.  The  first  aspect  will  be  discussed  in 
this  summary  and  the  other  aspects  will  be  more  fully  reported  at  the  conference.  The 
optical  design  of  the  system  was  presented  previously  [1],  and  this  report  will  concentrate 
on  the  practical  results. 

Overview  of  optical  system 

The  layout  of  the  proposed  optical  system  is  shown  in  Fig.l.  The  programmable 
interconnect  in  the  two  layers  of  interconnects  (LCTV2  &  3)  is  based  on  liquid  crystal 
television  screens  from  a  VPJ-2000  TV  projector  (Seiko-Epson).  These  screens  have  an 
anisotropic  pixel  layout  (480  x  440  pixels  on  a  56  x  46  |im  pitch),  and  there  are  three  of 
them,  tuned  to  the  blue  (475  nm),  green  (535  nm),  and  red  (610  nm)  wavelengths.  It  is 
convenient  to  use  the  blue  screen  for  the  input  (LCTV 1),  the  green  for  LCTV2,  and  the  red 
for  LCTV3.  We  have  selected  a  liquid  crystal  light  valve  for  the  hidden  layer  (LCLVl) 
which  can  be  written  at  488  nm  and  read  at  633  nm  (Micro-Optics  Technologies  SPT-25). 
The  shape  of  the  transfer  function  of  LCLVl  has  been  fitted  to  a  sigmoid  curve  [2],  and  the 
curve  fit  has  been  used  in  a  simulation  based  on  a  discretization  algorithm  [3].  The  output 
activation  functions  on  the  read  side  of  the  valves  are  monitored  by  photodetector  arrays 
(D2  &  3).  Since  the  valves  cannot  perform  optical  subtraction,  this  is  performed  in  the  PC 
as  follows  [4].  D1  monitors  the  input  activity  levels  of  the  hidden  layer  neurons  and 
transfers  them  to  the  PC  where  the  thresholds  are  subtracted.  Corrected  weight  values  are 
then  transmitted  to  the  weight  plane  (LCTV2). 

Not  shown  in  Fig.l  are  the  generation  of  the  16  x  16  spot  array  for  illuminating  LCTVl, 
using  grating  Gl,  and  the  generation  of  the  16  x  16  spot  array  for  illuminating  LCLVl, 
using  grating  G3. 
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Implementation  of  interconnect  between  input  and  hidden  layer 

The  implementation  of  a  full  interconnect  between  the  input  plane  (LCTVl)  and  the  hidden 
layer  plane  (LCLVl)  requires  a  tight  tolerance  optical  system,  where  we  employ  one  pixel 
of  the  weight  screen  (LCTV2)  per  interconnect  channel.  This  interconnect  replicates  the  16 
X  16  input  array  on  LCTVl  (Fig.  2a)  256  times  on  LCTV2,  and  reduces  each  16  x  16  block 
on  LCTV2  to  a  single  block  on  the  write  side  of  LCLVl  (Fig.  2b).  It  is  convenient  to 
choose  the  input  array  using  adjacent  pixels  on  the  input  LCTV  screen  (LCTVl),  so  that  the 
image  replication  is  performed  at  unity  magnification.  We  have  found  that  the  precision 
with  which  the  period  of  the  gratings,  G1  and  G2,  can  be  fabricated  is  about  1%.  Therefore, 
a  converging  beam  arrangement  (Fig.  1)  has  been  selected  in  order  to  allow  tuning  of  the 
fan-out  spacing  by  axial  displacement  of  the  grating.  The  spacing  can  be  tuned  by  several 
microns  per  mm  of  axial  displacement. 

Since  the  write  side  of  LCLVl  is  not  pixellated,  we  rely  on  the  overlapping  of  the  beams  to 
provide  the  integration  (fan-in)  function.  By  the  same  token,  we  are  free  to  scale  the  block 
size  and  spacing.  In  the  interests  of  cascadability,  where  another  LCTV  screen  (LCTV3)  is 
used  for  the  2nd  weight  plane  it  is  convenient  to  use  a  rectangular  pixel  spacing  format  of 
the  same  ratio  as  the  LCTV.  Then  the  pixel  spacing  of  the  LCTV  can  be  recovered  in  a 
simple  demagnification  stage.  Because  the  central  uniformity  patch  of  LCLVl  is  less  than 

15  X  10  mm2,  have  chosen  to  perform  a  4  x  demagnification  by  means  of  a  telescope 
onto  the  write  side  of  the  valve.  A  further  demagnification  of  6.25  x  in  the  second  layer  will 
then  recover  the  repeat  spacing  of  56  x  46  |im. 

All  the  optics  for  the  first  layer  up  to  LCTV2  has  been  designed  for  a  fully  interconnected 

16  X  16  input  to  16  x  16  hidden  layer.  However,  difficulties  with  the  LCTV  address 
electronics  have  obliged  us  to  downgrade  the  number  of  effective  neurons  to  8  x  8  in  the 
two  layers.  Hence,  we  can  use  a  unit  cell  of  2  x  2  pixels  on  the  LCTV2  to  code  the  weight 
of  each  interconnect  channel. 

The  8x8  replication  of  the  input  array  after  passing  through  a  fully  transmitting  LCTV2 
and  the  telescopic  reduction  is  shown  in  Fig.  2b. 

This  work  was  a  collaborative  project  with  the  Institute  Dalle  Molle  of  Artificial 
Intelligence  (IDIAP)  in  Martigny,  and  was  supported  by  the  Swiss  National  Science 
Foundation. 
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Fig.  1  Component  layout  of  multilayer  neural  network. 
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1.  Introduction 

In  recent  years,  many  kinds  of  optical  board-to-board  interconnection  systems  had  been  proposed  and  some 
of  them  were  demonstrated[l-9].  Basic  interconnection  systems  using  laser  diodes  and  photo  detectors  connected 
through  free  space[6]  or  Selfoc  lenses[5]  were  shown.  In  these  systems  basic  concept  is  based  on  sets  of  one-to-one 
interconnections  with  free  space.  Another  approach  is  using  fibers.  To  achieve  fixed  many-to-many  interconnections, 
optical  fiber  ribbon,  laser  diode  arrays  and  detector  arrays  are  usefull [3]. 

One-to-many  interconnections  like  a  system  bus  is  another  important  basis  for  construction  of  multi-CPU 
systems.  The  interconnections  on  bus  systems  with  a  wide  bandwidth  are  promised  by  using  optical  technology.  The 
key  technique  of  optical  one-to-many  interconnections  is  distribution  of  optical  signals.  It  is  possible  to  realize  using 
optical  fibers  and  star  couplers.  However,  a  large  number  of  fiber  connections  and  complex  systems  are  necessary. 

"Dialog"  is  the  system  on  the  basis  of  a  cylindrical  mirror  and  laser  diodes  with  a  free-space 
interconnection[l,2].  The  optical  signals  are  broadcasted  according  to  the  diffraction  angle  of  laser  diodes  and  the 
curvature  of  the  cylindrical  mirror.  J.Jiang  added  the  idea  of  2-dimensional  wave  guide  (parallel  plate  stack  with 
cylinder  mirror)  to  this  system[7].  These  system  based  on  a  cylindrical  mirror  could  use  only  60  to  70  degree  around 
the  mirror  by  the  limitation  of  angle  dependence  of  distribution.  The  system  which  utilize  only  the  emission  angle  of 
optical  devices  were  proposed[8,9].  Boards  were  located  around  circular  parallel  stacked  plates[8]  or  cylindrical 
space[9].  The  systems  were  introduced  multi-wavelength  LED[8]  or  beam-steering  laser  diodes[9].  In  both  case 
optical  signals  could  not  reach  to  the  boards  near  the  board  which  broadcasts  signals,  and  the  systems  needed  a  relay 
mechanism  [9]. 

In  this  paper  I  propose  optical  broadcasting  system  with  a  cylindrical  lens.  By  using  a  cylindrical  lens, 
boards  can  be  located  full  angle  (360  degree)  around  cylindrical  space.  And  this  idea  can  utilize  to  free  space  and 
parallel  plate  stack  systems. 


2.  Concept  and  Experiments  •  •  j 

The  basic  concept  is  shown  in  Fig.l.  The  output  of  the  laser  diodes  located  on  a  board  is  collimated  and 
projected  to  a  cylindrical  lens.  The  surface  reflection  of  the  lens  can  spread  projected  optical  power  around  the  lens. 
And  the  angle  of  transmitted  light  is  changed  according  to  the  input  angle  to  the  lens  surface.  All  photo  detectors 


settled  on  boards  receive  optical  signal  from  the  direction  of  the 
cylindrical  lens. 

In  the  experiment,  He-Ne  laser  with  15mW  output  power, 
a  PIN  photo  diode,  optics  and  a  glass  rod  which  diameter  was 
3mm  were  used.  The  laser  beam  was  collimated  by  same  diameter 
with  the  rod  and  projected  to  the  rod.  The  polarization  of  the  beam 
was  parallel  with  a  z-axis  of  the  rod.  The  optical  power  of  each 
angle  were  detected  by  the  detector.  The  results  are  shown  in 
Fig.2. 

3.  Simulations 

The  top  view  of  a  cylindrical  lens  is  shown  in  Fig.3.  The 
light  is  projected  in  parallel  to  the  horizontal  axis  from  left  side  of 
the  figure.  The  input  light  which  reach  A  point  is  reflected  to  the 
direction  with  the  power  of  P^.  Transmitted  light  refractes  at  A 
and  B  points  by  the  angle  of  Yb  "'kh  the  power  of  P^.  The  reflected 
light  at  B  point  refractes  at  C  point  with  direction  and  P^ 
power.  Here,  the  reflection  light  at  C  point  is  ignored. 


Fig.l  The  basic  concept  of  an  optical 
bus  using  a  cylindrical  lens. 


OMC20-2  /  93 


The  incident  angle  at  A  point  is  6j  and  refracted 
angle  is  0  j.  \|/,,  and  are  written  by 


¥.  =  2  0,, 

(1) 

II 

1 

(X> 

1 

(2) 

\|/^  =  2(202-0i). 

(3) 

respectively.  When  the  input  light  is  orthogonally  polarized 
with  the  z-axis  of  the  lens  (n-polarize),  the  power 
reflectivity  R  and  transmittance  T  at  A,  B  and  C  points 
are 

((nj/nj)cos(ej)  -  cos(Q^f 

R  = -  (4) 

((ni/nj)cos(e,)  +  cosCGj))^ 
and 

4(n,/n2)cos(0,)cos(02) 

T  = -  (5) 

((nj/n2)cos(ei)  +  cos(0J))^ 

where  nj  and  n^  are  refractive  index  at  outside  and  inside 
of  the  cylindrical  lens,  respectively.  In  another  case, 
parallel  polarization  (p-polarize),  T  and  R  are  written 
by 

(-cos(0,)  +  (nj/n^)  cos(0j))^ 

R  = -  (6) 

(cos(0,)  +  (n/nj) 
and 

4(nj/nj)cos(0,)cos(0^ 

T  = -  (7) 

(cos(0j)  +  (nj/n3)cos(0j))^ . 

The  power  which  are  detected  at  the  angle  of  and 

¥c.Pa-PbandP^are 
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Fig.2  Experimental  results  of  optical  power 
distribution  by  a  cylindrical  lens.  0  degree  means 
the  opposit  direction  of  the  input  laser  beam. 


Fig.3  Ray  tracing  of  a  projected 
light  beam  to  a  cylindrical  lens. 


P.=  RG,  Pb  =  'I^G  and  P^=T"RG,  (8) 


where  G  is  intensity  of  the  light  projected  to  A  point. 

On  these  computer  simulations,  the  light  beam  has  Gaussian  profile  and  about  84%  of  the  total  beam  power 
is  projected  to  the  lens.  The  results  with  n^  =  1.0  and  n^  =  2.6  are  shown  in  Fig.4.  The  vertical  axis  is  the  input  power 
to  the  detector  which  is  located  at  each  angle  and  has  a  0.5  degree  aperture.  When  we  construct  40cm  diameter 
system,  the  aperture  is  about  1.7mm.  At  0  degree  the  detector  get  about  3x10'^  times  the  total  power  of  the  input 
beam.  With  the  p-polarized  beam,  there  is  0  value  of  with  Brewster  angle. 

The  total  detected  power  with  various  n^  are  shown  in  Fig.5.  The  high  reflective  indexes  show  good 
characteristics  with  high  minimum  values.  The  refractive  index  of  3.6  is  possible  with  some  kinds  of  semiconductor 
materials. 


4.  Conclusion 

A  new  concept  of  a  one-to-many  optical  interconnection  system  with  a  cylindrical  lens  is  proposed.  An 
optical  beam  projected  to  a  cylindrical  lens  is  distributed  by  surface  reflection  and  the  refraction  of  the  lens. 


Intensity 
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Fig.4  The  results  of  computer  simulations;  optical  power  which  are  detected  by  a  photo  detector 
with  a  0.5  degree  aperture  at  an  n-polarized  input  beam  (a)  and  a  p-polarized  input  beam  (b). 


Computer  simulations  and  a  basic  experiment  are 
achieved.  The  simple  structure  realizes  a  wide  signal 
bandwidth  and  a  little  angle  dependence.  This 
concept  can  g^plied  to  many  kinds  of  optical  systems. 
The  idea  of  this  study  is  based  on  the  former 
unpublished  research  in  Seko  Laboratory  of  Keio 
University. 
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1,  Introduction, 


Over  the  last  few  years,  micron-size  opto-electronic  devices  have  become  key  components 
in  optically  interconnected  electronic  chips.  The  possibility  of  integrating  these  components  in 
conventional  electronic  families  (ECL,CMOSjFET)  has  allowed  the  design  of  various  hybrid  pro¬ 
cessing  units  [1]  whose  common  feature  is  the  optical  reception,  modulation  or  emission  and  the 
electronic  treatment  of  information.  The  use  of  this  hybrid  technology,  defined  as  smart  pixel 
technology,  has  provoked  recent  studies  on  whether  the  performance  of  a  smart  pixel  array  [2,  3], 
its  cost  and  reliability  [4,  5]  can  be  regarded  as  a  viable  solution  in  optical  information  process¬ 
ing,  This  paper  focusses  on  the  current  issues  involved  in  increasing  the  complexity  of  individual 
processing  nodes  in  order  to  optimize  the  performance  of  the  opto-electronic  processor  array. 
The  optimum  smartness  of  the  pixel,  that  is,  its  optimum  degree  of  complexity  is  quantified 
on  algorithmic,  electronic  and  optical  grounds.  Having  in  mind  a  particular  task,  such  as  data 
sorting,  performance  metrics  of  the  system  are  analyzed  in  terms  of  its  throughtput  rate,  power 
consumption,  real-estate  and  laser  source  power  requirements.  The  use  of  a  particular  example, 
such  as  the  bitonic  sorter  [6],  allows  us  to  quantify  exactly  the  power  dissipated  and  the  layout 
area  occupied  by  the  smart  pixel  in  each  of  the  different  electronic  families  examined.  Finally 
optically-induced  electronic  power  dissipation  necessitates  to  decrease  the  optical  power  at  each 
photodetector  for  a  given  operating  frequency  and  bit-error  rate  (BER).  This  issue  is  analysed 
in  the  context  of  global  optimization  of  the  emitter  (or  modulator)  and  detector  associated  with 
each  pixel. 

The  optical  transceivers  chosen  in  this  study  are  SEEDs  (Self  Electrooptic  Effect  Devices)  [7], 
which  are  either  flip-chip  bonded  onto  a  CMOS  electronic  chip  [8]  or  monolothically  integrated  in 
GaAs  MESFET  technology  (FET-SEED)  [10].  For  all  families  considered,  the  chip  area  is  1  cm^ 
to  allow  a  good  yield  in  the  manufacturing  of  the  chip  and  an  easy  implementation  of  the  optical 
hardware.  The  heat  removal  capability  has  been  fixed  at  lOWcm  for  which  conventionnal 
cooling  methods  are  just  adequate.  The  maximum  laser  source  power  is  limited  to  IW  for  quasi 
CW  and  pulsed  operation,  and  10 W  for  the  CW  mode  of  operation. 

2>  Algorithmic  considerations. 

The  pixel  density  depends  on  the  degree  of  smartness  of  the  pixel  if  no  topological  sep¬ 
aration  of  the  transceivers  from  their  logic  circuitry  is  undertaken.  The  design  of  aU  arrays 
encountered  to  date,  then  shares  a  common  characteristic  :  the  pixels  are  regularly  distributed 
along  both  dimensions.  The  logic  circuitry  associated  with  each  pixel  lies  in  the  proximity  of 
their  transceivers,  rendering  the  whole  array  functionally  partitioned.  The  technology-dependent 
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area  occupied  by  logic  circuitry  determines  the  degree  of  intelligence  of  the  pixel.  The  smarter 
the  pixel,  the  larger  the  pixel  pitch,  the  smaller  the  packing  density.  In  the  case  of  the  bitonic 
sorter,  the  system  throughput  rate  will  be  analyzed  with  respect  to  the  type  of  electronic  family 
(MESFET,CMOS)  which  implements  the  sorting  node  and  with  respect  to  the  sophistication  of 
the  node  itself. 

Optimising  the  performance  of  a  particular  task  depends  not  only  on  the  sophistication  of  the 
nodes  but  also  on  architectural  constraints.  Much  of  our  analysis  is  based  around  the  use  of  the 
EX-CLIP  architecture  [9],  which  may  be  thought  of  as  an  iterative  loop  containing  a  logic  plane 
with  local  memory  and  a  non-local  interconnection  that  may  be  selected  to  best  perform  the  task 
in  question.  For  a  given  architecture,  the  algorithmic  analysis  must  consider  two  related  criteria 
in  the  light  of  the  physical  and  technological  constraints  :  the  optimum  logic  functionality  at  a 
particular  pixel  and  the  optimum  algorithmic  complexity  at  a  processing  node.  In  the  case  of  the 
EX-CLIP  processor,  decreasing  the  pixel  complexity  from  a  fuUy  integrated  2-by-2  sorting  node 
to  a  single  S-SEED  (NAND,  NOR)  functionality,  while  increasing  the  pixel  density,  results  in 
an  increase  in  the  number  of  pixel-array  transfers  necessary  to  implement  the  sorting  algorithm 
and  also  decreases  the  time  required  for  an  array  transfer.  Conversely,  if  we  modify  the  bitonic 
sorting  algorithm  to  accomodate  the  more  complex  4-by-4  merge/splitting  node,  the  number  of 
iterations  to  complete  the  task  is  reduced  but  there  is  an  increase  in  the  iteration  time  and  in  the 
necessary  pixel  area.  It  is  therefore  apparent  that  such  considerations  are  essential  in  determining 
the  optimum  smartness  of  pixel  which  maximizes  the  throughput  for  both  general  and  special 
purpose  computational  schemes. 


3.  Latencies  and  power  dissipations 


The  transfer  and  processing  times  of  information  can  be  divided  into  five  categories  :  1. 
the  time  of  flight,  Topt,  of  the  optical  output  from  one  array  onto  the  input  photodetector  of  the 
successive  array,  2.  the  conversion  time,  Tconv?  of  the  optical  beams  into  voltage  (or  current) 
swings,  3.  the  amplification  (and/or  decision)  time,  Tampi  of  the  input  electronic  signals  into 
VLSI  compatible  logic  levels,  4.  the  processing  time,  Tgiecj  of  the  logic  circuitry  and  5.  the  con¬ 
version  time,  Tout?  of  the  electronic  output  into  modulator  compatible  levels.  The  dependence  of 
these  different  times  will  be  presented  with  respect  to  the  mode  of  operation  of  the  array  (quasi 
CW,  pulse  mode,  CW)  and  the  optical  power  incident  on  the  transceivers.  For  example.  Tamp 
will  increase  as  the  optical  input  power  decreases  since  it  is  likely  that  multiple  amplification 
stages  will  be  needed  in  order  to  achieve  higher  gain.  In  the  same  manner,  Tconv  has  been  shown 
to  depend  either  on  the  power  or  the  energy  of  the  input  beams  for  respectively  SEEDs  [7]  and 
FET-SEEDs  [10]. 

In  the  same  manner  the  power  budget  for  the  node  can  be  calculated  for  the  sorting.  SPICE 
simulation  allows  us  to  quantify  exactly  the  power  dissipated  by  the  logic  circuitry,  whereas 
the  dynamic  equations  for  the  transceivers  provide  the  optically-induced  electronic  powers.  The 
power  dissipated  wiU  be  given  with  respect  to  the  frequency  of  operation  and  the  mode  of  oper¬ 
ation  of  the  array.  Depending  on  the  pixel  smartness,  the  technology  used  and  the  frequency  of 
operation,  the  amount  of  laser  source  power  or  the  heat  removal  capability  will  be  the  limiting 
factors. 

4.  Global  optimisation  of  the  transceivers> 


In  order  to  decrease  the  overall  power  consumption  of  the  node,  several  solutions  have  been 
proposed  :  1,  A  decrease  of  the  responsivity  of  the  modulator  achieved  by  a  decrease  of  the 
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photo-generated  carrier  lifetime  [11,  12],  2.  An  increase  of  the  sensitivity  of  the  photodetector,  3. 
A  reduction  in  the  optical  input  power.  The  last  proposition  will  be  analysed  with  respect  to  the 
resulting  input  voltage  or  current  swing  induced  by  the  input  signal.  In  order  to  provide  VLSI 
compatible  logic  levels,  a  high  gain  amplification  becomes  necessary.  This  has  four  deleterious 
consequences  ;  1.  The  higher  gain  demands  more  silicon  area  reducing  the  space  available  for  logic 
operation,  2.  The  amplification  power  increases  adding  to  the  total  chip  power  consumption,  3. 
Tamp  is  increased  as  explained  before,  4.  For  certain  amplification  schemes,  the  power  available 
might  not  realize  the  required  BER  and  frequency  of  operation.  On  the  other  hand,  an  increase 
of  the  read  beam  power  from  one  array  allows  a  reduction  of  the  amplification  stage  of  the  pixels 
of  the  next  array  since  the  input  power  is  increased.  This  is  achieved  however  at  the  expense  of 
optically-induced  power  consumption  at  the  modulators  of  the  first  array.  There  exists  therefore 
an  optimum  input  power  which  minimizes  the  total  power  dissipated  at  the  transceivers  and 
amplifiers  without  increasing  too  much  the  latency  time  and  area  achieved  by  those  components. 
This  optimum  can  also  be  translated,  for  a  given  laser  source  power,  to  an  optimum  fanout  in 
the  case  of  optical  clock  distribution  [13]. 
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Background  /  Motivation 

As  voice  and  data  communications  networks  proliferate,  they  face  ever  increasing  demands  for 
reliability,  portability,  and  bandwidth.  In  many  applications,  the  transmitted  power  is  limited  by  practical 
considerations.  Examples  include  satellite,  cellular,  and  undersea  long  haul  fiber  communications 
systems.  In  these  applications  Forward  Error  Correction  (FEC)  techniques  may  be  used  to  achieve  reliable 
communications  within  the  constrained  power.  FEC  techniques  are  ultimately  limited  in  their 
performance  by  the  conflicting  requirements  of  high  speed,  high  computational  complexity,  and  low  size 
and  power  consumption.  VLSI  implementations  of  the  elegant  and  powerful  Viterbi  convolutional 
decoding  algorithm  (VA)  [1],  which  uses  a  recursive  parallel  search  computation,  are  limited  by  the 
massive  intra-  and  inter-chip  communications  requirements  between  nodes  of  the  search  graph.  This 
constraint  limits  the  number  of  states  (nodes  of  the  VA  graph),  for  high-speed  applications,  and  hence  the 
overall  performance  of  the  VA.  Current  high  speed  single  chip  VLSI  implementations  are  limited  to  a 
convolutional  constraint  length  of  about  7  and  therefore  require  2^=128  processing  nodes.  Incrementing 
the  constraint  length  by  one  provides  nearly  an  order  of  magnitude  improvement  in  BER  [2],  but  requires 
twice  as  many  computational  and  communications  resources  -  beyond  the  capabilities  of  a  single  chip. 
This  size  constraint  limits  single  chip  VLSI  implementations  to  a  coding  gain  of  ~7  dB.  Strong 
motivation  exists  for  using  longer  constraint  length  codes,  requiring  several  decoding  ICs.  A  multi-chip 
VLSI  VA  implementation  is  impractical  for  high  speed  applications  due  to  the  inter-chip  communications 
bottleneck.  The  approach  discussed  in  this  paper  overcomes  this  limitation  by  employing  free-space 
optical  interconnects  to  provide  the  required  inter-chip  connection,  while  maintaining  on-chip  speeds 
between  chips. 


Free-space  Optical  VA  Approach 

The  VA  is  a  parallel  recursive 
search  algorithm,  requiring  a  regular 
shuffle  interconnection  between  2’"'’ 
nodes,  where  k  is  the  constraint  length. 
The  required  interconnection  between 
these  nodes  is  an  inverse  perfect  shuffle 
for  path  metric  transmission  and  a  perfect 
shuffle  for  traceback  and  readout.  A 
single  optical  system  can  be  used  for  data 
transfer  in  both  directions  simultaneously 
[3].  Figure  1  depicts  a  retro-reflective 
system  which  interconnects  a  multi-chip 
Viterbi  processor,  in  which  all 
Optoelectronic  Integrated  Circuits 
(OEICs)  are  located  in  a  single  plane. 
This  approach  allows  the  VA  nodes  to  be 
distributed  amongst  many  chips,  while 
maintaining  high  speed.  Traditional 
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Figure  1.  Optically  interconnected  VA  decoder. 


VLSI  implementations  utilize  '-1/3  of  the  chip  real  estate  for  inter-node  connections  [4].  For  example,  a 


4x4  chip  array  would  contain  approximately  16x  the  number  of  processing  nodes  of  a  single  chip.  Since 


every  factor  of  2  in  the  number  of  processing  nodes  provides  -0.5  dB  of  coding  gain,  this  4x4  multi-OEIC 


approach  provides  an  additional  2  dB  of  coding  gain  or  -4  orders  of  magnitude  improvement  in  BER. 
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The  VA  is  implemented  with  this  architecture  as  follows.  The  encoded  input  data  are  broadcast 
to  all  of  the  nodes  of  the  array,  where  they  are  used  to  compute  the  cost  metrics  (i.e.,  Hamming  distance) 
for  the  edges  of  associated  VA  trellis.  This  is  a  simple  2  or  3  bit  computation  that  may  be  accomplished 
with  a  4  or  8  position  look-up  table  at  each  node.  These  metrics  are  then  added  to  a  stored  metric  at  each 
pixel-pair  that  is  the  accumulated  metric  so  far  in  the  recursion  at  that  node.  The  sum  is  then  broadcast 
from  each  node,  over  the  optical  interconnection  network,  to  the  associated  nodes  corresponding  to  the 
next  stage  of  the  trellis.  If  the  dynamic  range  of  the  metric  data  is  small  enough,  these  data  can  be 
transmitted  in  analog  form,  but  a  binary  representation  may  be  preferred.  The  metric  data  detected  at  the 
receiving  array  are  compared  pairwise.  The  better  metric  (e.g.,  the  lower  valued)  is  stored  to  replace  the 
previous  cumulative  metric  and  the  bit  (0  or  1)  associated  with  the  "winning"  node  is  stored  as  a 
concatenation  to  a  small  local  buffer  located  at  each  pixel  pair.  This  buffer  is  as  long  as  the  decoded 
message  length.  Typically  it  will  be  at  least  several  constraint  lengths  long  (approximately  20-40  bits). 
When  this  buffer  is  full,  then  the  decoded  word  can  be  read  out  by  reversing  direction  of  the  network  and 
reading  the  stored  values  of  the  maximum  likelihood  estimate  of  the  coded  word  by  effectively  tracing  the 
path  of  the  final  survivor  through  the  trellis.  Because  of  the  bidirectionality  of  the  optical  interconnection 
system,  the  read-out  process  of  sequence  n  can  occur  simultaneously  with  the  forward  pass  VA  decoding 
process  of  sequence  n+1 .  The  recursion  would  therefore  suffer  only  the  usual  fixed  latency  of  a  VLSI 
approach  (equal  to  about  several  constraint  lengths)  in  reading  out  the  decoded  data. 

VA  Smart  Pixel  Design  Issues 

Figure  2  is  a  functional  diagram  of  a  single  VA  node  when  implemented  as  a  smart  pixel.  The 
recursive  VA  requires  an  interconnection  network  for  forward  propagation  of  path  metrics  and  a  backward 
interconnection  network  for  readout.  To  achieve  this  each  smart  pixel  has  4  optical  inputs  and  outputs, 
the  first  and  third  columns  of  emitters  and  detectors  are  used  to  propagate  the  path  metrics  and  the  second 
and  fourth  are  used  in  trace-back/readout.  In  this  way  the  two  interconnection  networks  are  interleaved 
and  utilize  the  same  ffee-space  optics.  The  smart  pixel  functions  as  follows:  Accumulated  path  metrics 
are  received  by  the  first  and  third  detectors  and  shifted  into  the  corresponding  counters.  Next  the  encoded 
data  are  received  and  are  shifted  into  the  2-bit  register.  The  registers  are  then  compared  with  the  Look  Up 
Tables  (LUT)  and  the  accumulated  path  metrics  are  incremented  accordingly.  The  surviving  path  is  then 
determined  by  a  comparator,  and  the  new  path  metric  is  transmitted  to  the  next  stage.  The  surviving  path 
is  stored  in  2-bit  memory  to  determine  which  paths  survive  beyond  the  "Survivor  Depth"  of  the  decoder. 
This  decoded  data  can  then  be  read  out  of  the  processor. 


•  Emitter  [H  Detector 


Figure  2.  Functional  diagram  of  VA  Smart  Pixel. 

A  single  VA  processing  node  with  the  functionality  of  Figure  2  requires  -10,000  transistors. 
Projected  VLSI  densities  are  -3,300  /mm^  [5].  Therefore  a  single  Viterbi  processing  node  would  require 
about  3  mm^  We  allocate  1.5  mm'  for  the  VCSELs,  detectors,  and  their  associated  driver  circuitry 
(therefore,  the  optoelectronics  occupy  -1/3  of  the  total  OEIC  real  estate).  This  OEIC  density  provides  for 
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'-0.5  mm  center-to-center  spacing  between  the  optical  elements  shown  in  Figure  2.  The  spacing  required 
by  this  design  are  consistent  with  projected  VCSEL  array  spacings. 

Experiments 

Wide  angle  VCSEL  array 
imaging  experiments  with  off-the-shelf 
miniature  video  camera  lenses 
(f=25mm)  were  conducted  to 
characterize  elements  of  the  interleaved 
imaging  system  shown  in  Figure  1.  A 
Honeywell-supplied  VCSEL  array  was 
imaged  at  positions  across  the  20‘^FOV 
of  the  video  lens.  This  FOV  was  limited 
by  vignetting  of  the  narrow  VCSEL 
beam  by  the  camera  lens  barrel  (these 
lenses  were  designed  for  receiving  off- 
axis  beams,  not  transmitting  them). 

Data  was  collected  by  a  CCD  array  with 
~  1 3  pm  square  pixels.  Figure  3  shows  the  resulting  images  for  VCSELs  located  on  axis  and  at  the  edge  of 
the  field.  The  data  show  the  distortion  of  extreme  off  axis  VCSEL  location,  yet  a  detector  of  ~30pm 
would  capture  a  significant  amount  of  the  energy.  Custom  designed  optical  elements  will  avoid  the 
vignetting  and  distortion  of  the  off-the-shelf  lenses. 

Since  the  smart  pixel  OEICs  required  for  the  VA  implementation  are  not  yet  available,  we  have 
devised  a  proof-of-concept  experiment  replacing  VCSEL  and  detector  arrays  with  fiber  coupled  emitters 
and  detectors.  The  polished  fiber  ends  are  mounted  in  a  Lucite  backplane  to  mimic  the  detector/emitter 
arrays  in  their  MCM  orientation.  The  system  utilizes  a  PC’s  memory  and  processor  in  place  of  the  VA’s 
integrated  circuitry.  This  setup  will  allow  us  to  evaluate  a  custom  wide  angle  imaging  system  for  use  in  a 
prototype  Viterbi  decoder.  The  computer  control  will  provide  flexibility  in  the  VA,  as  well  as  evaluate  the 
optical  system  performance.  As  VCSEL  based  smart  pixel  technology  becomes  available,  the  fiber 
baseplate  will  be  replaced  with  an  OEIC  array. 

Systems  Application  Example 

Long-haul  undersea  fiber  communications  is  a  good  example  application  in  which  the 
transmitted  power  is  limited.  In  this  case  the  power  limit  stems  from  nonlinear  effects  of  high  power 
transmission.  Projected  10  Gbit/s  links  have  repeater  spacings  of  '-30-60  km,  making  the  repeaters  a 
significant  cost  element  of  the  system  (second  only  to  the  cost  of  the  cable  itself).  Therefore,  there  is 
motivation  for  increasing  the  repeater  spacing.  Since  the  transmitted  power  is  limited,  coding  is  the  only 
way  to  increase  the  repeater  spacing,  while  maintaining  performance.  Given  a  fiber  attenuation  of  0.2 
dB/km,  every  dB  of  coding  gain  provides  ~5  km  additional  repeater  spacing.  The  smart  pixel  based  VA 
approach  will  provide  an  increase  of  2  dB  of  coding  gain  over  traditional  VLSI  approaches,  creating  a 
significant  increase  in  repeater  spacing. 

This  work  is  funded  by  the  Ballistic  Missile  Defense  Organization  through  the  Office  of  Naval  Research. 
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Introduction 

Smart  Pixel  based  free-space  optical  interconnects  offer  a  method  of  establishing  high 
connection  densities,  and  subsequently  large  data  throughputs,  in  applications  such  as  ATM 
networks  massively  parallel  computers,  and  photonic  backplanes.  The  design  of  these  optical 
systems  require  a  means  of  quantifying  the  trade-offs  between  the  effective  processing  power  of  a 
Smart  Pixel  array  and  the  optical  interconnection  geometry  [1,2,3].  In  light  of  this  interest,  we 
have  developed  a  simple  model  which  outlines  the  trade-offs  between  optical  connection  density 
and  Smart  Pixel  intelligence.  The  objective  of  the  model  is  to  define  a  reasonable  operating  region 
where  the  following  parameters  are  optimized:  the  number  of  transistors  per  Smart  Pixel,  the 
optoelectronic  device  window  size,  the  lenslet  size,  the  f/number  of  the  lenslet,  and  the  optical 
connection  density. 


In  considering  the  optical  layout,  assumptions  pertaining  to  some  of  the  optical  parameters 
were  made  in  order  to  limit  the  number  of  variables  and  provide  a  tractable  solution.  The  opticid 
system,  shown  in  Figure  1,  was  based  on  a  4-f  telecentric  optical  system,  operating  at  850  nm.  A 
transmitter  array  on  a  Smart  Pixel  die  is  relayed  to  a  receiver  array  on  an  adjacent  Smart  Pixel  die. 
Diffractive  lenslets  arrays  with  a  focal  length  of  6.5  mm  were  chosen  allowing  for  an  ^ay  to  array 
Spacing  of  approximately  26  mm.  This  separation  was  chosen  because  it  is  close  to  the  spacing  o 
Printed  Circuit  Boards  (PCBs)  in  most  electronic  backplanes  such  as  the  VME™  standard 


backplane. 


Figure  1:  Telecentric  Lenslet  Array 
Optical  System 


Based  on  an  interest  in  scaleable 
designs,  we  assumed  the  area  of  the  single 
lenslet  governed  the  maximum  usable  area 
available  for  electronics  per  Smart  Pixel.  The 
lenslet  size  thus  dictates  the  amount  of  the 
area  on  the  Smart  Pixel  die  for  processing 
electronics.  Owing  to  the  recent  success  in 
integrating  III-V  optoelectronics  with  CMOS 
electronics  [4],  the  transistor  density  for  a 
standard  CMOS  VLSI  1.2  micron  feature^size 
process  was  assumed  to  be  800  Tx/mm^.  In 
addition,  differential  optical  FO  was  assumed 
for  both  transmitters  and  receivers. 


The  optimum  lenslet  array  geometry  may  be  calculated  using  a  Gaussian  beam  analysis  to 
find  the  dependence  of  lenslet  size  on  optoelectronic  device  window  size.  Since  the  assumption 
used  is  that  lenslet  size  equals  Smart  Pixel  size,  the  Smart  Pixel  size  can  be  related  to  the  size  of  the 
window.  The  analysis  was  based  on  the  restriction  that  device  window  size  d,„  (assummg  square 
windows)  was  equal  to  Swq,  where  Wq  is  the  beam  radius  of  the  focused  beam.  In  the  model,  the 
lenslet  diameter,  and  hence  the  Smart  Pixel  size,  was  adjusted  until  a  minimum  beam  diameter  of 
3W5  fit  inside  the  lens  facet  for  each  signal  beam.  These  limits  ensured  minimum  chppmg  ot  the 
beam. 
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For  the  simplest  case  where  one  lenslet  focuses  onto  one  optoelectronic  device,  the 
following  equation  is  obtained: 


Dp  -  d^. 


1+ 

f  9/A  j 

[ndjj 

where  Dp  is  the  size  of  lenslet 
and  d,,,  is  the  size  of  the  window. 


Using  these  basic  relations,  any  additional  geometry  can  be  considered  by  appropriately  modifying 
this  relationship.  The  model  behaves  such  that  as  die  Smart  Pixel  and  corresponding  lenslet  array 
dimensions  increase,  the  size  of  the  windows  decrease.  A  maximum  limit  for  the  size  of  the 
lenslets  exists  and  is  determined  by  the  number  of  phase  levels  in  the  diffractive  structure  and  the 
f/number  of  the  array  [5,6]. 


Model  Results 


Design  space  trade  offs  including  optoelectronic  window  size,  window  spacing.  Smart 
Pixel  dimensions  and  transistor  count  were  determined  for  four  optical  interconnect  configurations, 
where  a  Smart  Pixel  is  defined  as  having  4  device  windows. 

For  Case  1:  a  separate  lenslet  relay  exists  for  each  single  optoelectronic  device;  Case  2:  a 
separate  lenslet  relay  exists  for  each  pair  of  optoelectronic  devices;  Case  3:  a  Smart  Pixel  consisting 
of  4  optoelectronic  devices  has  one  lenslet  relay;  and  Case  4:  four  Smart  Pixels  have  one  lenslet 
relay  for  a  cluster  of  16  optoelectronic  devices.  A  grouping  of  Smart  Pixels  interconnected  by  one 
lenslet  relay  will  be  defined  as  a  Pixel  Cluster  so  in  Case  4,  the  Pixel  Cluster  consists  of  four  Smart 
Pixels. 


Figure  2:  Optical  Connection  Density 


Figure  3:  Lenslet  F/Number 
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Figure  4:  Size  of  Smart  Pixel 


Figure  5:  Number  of  Transistors  per  S/P 
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Figures  2-5  describe  the  relationship  between  device  window  size  and  key  photonic  system 
design  parameters.  Figure  2  shows  the  dependance  of  connection  density  on  window  size  and 
geometry  For  the  purposes  of  this  work,  a  channel  is  defined  as  the  optical  I/O  to  one  Sm^  Pixel 
and  consists  of  four  optoelectronic  device  windows.  In  particular,  the  Pixel  Cluster  (Case  4) 
highlights  the  impact  of  clustering  optoelectronic  FO.  In  addition,  upperbounds  can  be  idenmied 
where  above  a  certain  device  window  size,  a  maximum  occurs;  this  is  due  to  the  intermediate  beam 
waist  between  the  lenslet  arrays  becoming  smaller  than  the  beam  waist  at  the  device  planes.  Figure 
5  shows  a  second  upper  bound  on  the  device  window  size  as  a  function  of  transistor  count,  this 

will  impact  the  complexity  of  the  Smart  Pixel  electronics.  u  i  *,  f 

The  lower  boundaries  are  determined  by  the  required  f^number  and  the  manuiacturabmty  ot 
low  f/number  focusing  diffractive  lenses,  as  well  as  the  total  physical  size  of  the  array  which  can 
be  determined  from  Figure  4. 

Conclusions 

The  results  presented  in  the  this  analysis  show  the  tr^e  off  between  transistor  count, 
channel  density  (Smart  Pixels/cm^),  and  window  size  for  a  diffractive  lenslet  array  based  relay 
using  four  different  Smart  Pixel  geometry’s.  The  analysis  is  restricted  to  the  case  of  an  interconnect 
using  6.5  mm  focal  length  lenslets  operating  at  850  nm.  Several  conclusions  may  be  drawn  froin 
this  analysis.  First,  in  each  of  the  four  cases  studied  there  exists  a  certain  minimum  Smart  Pixel 
size,  and  thus  maximum  channel  density,  below  which  it  becomes  impossible  to  relay  the  signal 
beams.  This  limit  is  a  consequence  of  the  3w  restrictions  used  in  the  analysis.  Although  higher 
channel  densities  may  be  used,  clipping  of  the  signal  beams  will  occur,  which  may  cause  problems 
if  the  light  propagates  through  several  such  relays.  The  minimum  window  size  and  maximum 
transistor  count  will  be  determined  by  the  Fnumber  of  the  lenslets.  Assiming  the  interconnect  uses 
8  level  diffractive  lenslets  to  maximize  efficiency,  the  f/number  will  be  Limited  to  about  8. 

The  analysis  also  illustrates  the  advantage  of  using  a  Pixel  Cluster  configuration  in  a  lenslet 
array  based  relay  system;  far  higher  channel  densities  can  be  achieved  for  the  s^e  window  size. 
Conversely,  smaller  windows  may  be  used  for  the  same  channel  density.  This  will  reduce  the 
amount  of  optical  and  electrical  power  required  to  operate  a  large  array  of  Smart  Pixels.  It  should 
be  noted,  however,  that  if  each  Smart  Pixel  requires  a  large  number  of  transistors,  the  Pixel  Cluster 
design  may  no  longer  be  the  optimum  choice. 
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To  compare  monolithic  and  hybrid  optoelectronic  technology  to  electronics  for  interconnects,  this 
paper  considers  systems  with  the  same  manufacturing  cost.  For  a  given  cost  system,  we  calculate 
the  performance  advantage  that  makes  chip-to-chip  optical  interconnects  competitive  with  electronic 
wire  bonds  and  solder  bumps .  Adjusting  the  number  of  HO  connectors  by  the  ratio  of  the 
technology  defect  densities  forces  the  system  costs  to  be  identical.  Balanced  system  design 
principles  and  present  defect  densities  fiirther  restrict  hybrid  optoelectronics  to  a  logic/connector 
ratio  of  10^  and  monolithic  integration  to  a  ratio  of  ten. 

Introduction 

The  many  performance  comparisons  between  electrical  and  optical  interconnects  in  terms  of  power 
dissipation,  skew,  and  density  largely  neglect  cost.  The  resulting  system  demonstrations  perform 
well,  but  are  too  costly  to  become  products.  This  paper  takes  a  different  approach  that  compares 
the  performance  of  systems  that  cost  the  same.  To  force  the  cost  to  be  the  same,  the  relative 
number  of  each  type  of  device  in  the  system  depends  on  the  device  manufacturing  defect  rates. 

The  device  types  included  in  the  comparison  are  CMOS  transistors,  solder  bumps  and  wire  bonds 
for  electronics,  and  monolithic  GaAs  or  hybrid  on  silicon  for  optoelectronics. 

The  first  section  describes  the  defect  rates  of  devices  and  their  temporal  trends.  The  next  section 
discusses  the  implications  of  the  different  defect  densities  on  the  organization  of  balanced  systems. 
The  following  section  performs  the  tradeoff  of  different  technologies  in  a  fixed  cost  system  to 
determine  the  necessary  relative  performance.  Before  concluding,  we  discuss  the  general 
implications  and  some  caveats  that  limit  the  generality  of  the  results. 

Defect  Rates 

Undesirable  events  occur  during  device  manufacturing  that  affect  device  function.  For  instance,  in 
CMOS  technology  point  defects  from  particulates  in  the  air  or  impure  materials  create  open  or 
shorts  in  the  photolithographic  layers  that  define  transisitors.  Thus,  even  though  the  present  defect 
level  is  relatively  low,  ever  cleaner  rooms  and  materials  are  necessary  for  the  next  generation  of 
CMOS  technology. 

Defect  densities  influence  on  cost  also  explains  part  of  the  push  toward  smaller  linewidths;  smaller 
devices  will  have  a  lower  probability  of  encountering  a  manufacturing  point  defect.  The  SIA 
roadmap  expects  the  defect  rate  to  decrease  substantially  over  the  next  several  years.  The  trends  of 
smaller  linewidths  and  lower  defect  densites  in  combination  imply  that  the  transistor  defect  rate  will 
decrease  substantially. 

The  two  leading  chip-to-module  or  board  connector  technologies  are  wire  bonding  and  solder 
bonding.  Since  solder  bumping  is  a  simpler  process,  the  defect  density  per  electrical  connection  is 
three  orders  of  magnitude  lower. 
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Optoelectronic  device  fabrication  is  similar  to  CMOS  technology  by  using  the  same  l^ic 
lithographic  process  of  etching,  implantation,  and  deposition.  The  major  difference  is  that  the 
material  quality  is  much  worse.  Epitaxially  grown  GaAs  with  Al,  In,  or  P  has  material  defect 
densities  around  100  per  square  centimeter.  In  addition,  the  optical  and  alignment  capabilities  of 
free-space  interconnect  systems  limit  optoelectronic  devices  to  about  10  microns  on  a  side.  Thus, 
the  device  failure  rate  is  quite  high.  If  Ae  transistors  are  made  in  the  same  material  as  the 
optoelectronic  devices,  their  smdler  size  gives  them  an  order  of  magnitude  lower  device  defect 
rate. 

The  following  table  illustrates  the  present  defect  rates  in  interconnect  and  logic  technology.  The 
defect  density  is  the  raw  density  of  fatal  defects  in  a  given  technology.  Size  accounts  for  the  fact 
that  devices  have  different  sizes  that  are  susceptible  to  point  defects.  The  solder  bump  and  wire 
bond  Normalizing  the  raw  density  by  the  device  size  produces  a  defect  rate  that  is  a  function  only 
of  the  device  type. 


Defect  Density 

Size  [um^] 

Device  Rate 

CMOS  Transistor 

0.3/cm2 

10 

10-8 

Solder  Bump 

lO'^/bump 

- 

10-5 

Wire  Bond 

10'3/bond 

- 

10-3 

Epitaxial  GaAs 

lOO/cm^ 

100 

10-4 

10 

10-5 

Balanced  Systems 

Balanced  system  design  devotes  limited  resources  to  parts  cT  a  system  in  an  attempt  to  optimize 
some  system  metric.  Balancing  denotes  the  change  in  subsystem  contributions  to  the  metric  as  the 
resources  are  shifted.  With  regard  to  a  metric  like  performance  per  die  area  in  microprocessors, 
balanced  design  devotes  circuit  area  speed  up  the  most  common  tasks.  Balancing  is  the  design 
philosophy  that  motivated  the  transition  to  RISC  from  CISC  architectures. 

Consider  balancing  the  manufacturing  costs  the  present  microprocessor  architectures.  To  equalize 
the  yield  per  step,  the  device  defect  rate  times  the  number  of  devices  should  be  a  constant.  Thus,  a 
cost-balanced  chip  will  have  the  ratio  of  chip  connections  to  transistors  adjusted  to  equal  to  the  ratio 
of  the  transistor  to  connector  defects.  From  the  table  in  the  previous  section,  the  latter  ratio  is  ICP. 
This  corresponds  to  the  high-end  microprocessors  that  have  about  5  million  transistors  and  500 
wire  bonds,  an  identical  ratio. 

When  the  semiconductor  industry  moves  to  higher  yield  solder  bimip  connector  methods,  yield 
balancing  will  force  chips  to  be  smaller  than  if  they  were  made  with  wire  bonds.  Since  the 
transistor  size  and  defect  density  will  decrease  also,  the  effect  may  not  be  observed  in  a  change  m 
the  ratio  of  transistors  to  connections. 

For  optoelectronic  systems,  the  defect  densities  imply  that  a  balanced  system  will  have  lO^  CMOS 
transistors  per  hybrid  optoelectronic  I/O  channel.  For  a  monolithically  integrated  system,  the  high 
electronic  defect  density  implies  that  die  ratio  of  transistors  to  optical  I/O  chaniiels  should  be  ten. 
This  is  one  reason  why  monolithic  OEICs  have  been  limited  to  small  scales  of  integration. 
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Explaining  the  ratio  of  package  connections  to  transistors  in  terms  of  balancing  yields  is  a  departure 
of  the  usual  approach  of  using  Rent's  rule.  Rent's  rule  clearly  fails  for  modem  microprocessors 
because  of  their  large  caches,  and  has  always  failed  for  systems  with  large  amounts  of 
deterministic  interconnect  structure  like  memories  and  switches. 


Cost-Performance  Tradeoff 

In  the  last  section  we  explained  why  in  a  balanced  system  the  yield  of  the  connectors  should  be 
roughly  the  same  as  the  chip  yield.  To  force  two  b^anced  systems  to  have  the  same  cost,  the  one 
with  a  solder-bump  connector  will  have  10  times  more  I/O  channels  than  one  with  an  hybrid  optical 
connector.  To  be  competitive  in  performance,  the  optoelectronic  connectors  must  make  up  from 
their  lower  number  with  performance  advantages. 

If  the  relevant  performance  metric  is  bandwidth,  a  hybrid  optoelectronid  device  may  compete  by 
offering  10  times  the  bandwidth  of  the  solder  connector.  Since  electrical  driver  power  dissipation 
and  wire  parasitics  limit  the  electrical  bandwidth,  avoiding  these  in  optics  may  allow  the  necessary 
10  times  device  improvement. 

Another  way  to  compensate  for  the  fewer  number  of  hybrid  optoelectronic  I/O's  is  to  have  an 
architectural  advantage.  For  instance,  some  graphs  like  perfect  shuffles  and  hypercubes  have  large 
area  board  layouts  that  use  expensive  board  area.  By  using  hybrid  optoelectronic  I/O,  the  extra 
board  area  can  be  eliminated  and  the  cost  reduced. 

For  monolithic  integration,  to  balance  yield  the  number  of  transistors  must  be  1000  times  less  than 
the  CMOS  or  hybrid  optoelectronic  systems.  Somehow,  these  fewer  logic  devices  must  give  a 
thousand  fold  performance  advantage  to  be  competitive. 

Instead  of  performance,  we  can  make  cost-reliability  tradeoffs  by  identifying  the  reliability 
improvements  that  are  necessary  to  compensate  for  fewer  devices.  Unfortunately,  the  failure  rate 
of  optoelectronic  devices  is  not  as  well  known  as  their  performance. 

Discussion 

There  are  several  issues  that  must  be  kept  in  mind  when  interpreting  these  results.  First,  we  have 
only  considered  the  contribution  of  defect  density  to  cost.  In  reality,  optoelectronic  materials  and 
processes  are  more  expensive  than  their  electronic  counterparts.  Since  yield,  and  hence  cost,  are 
exponential  in  the  defect  rates,  the  transistor  to  I/O  ratios  should  include  the  logarithm  of  ratio  of 
the  material  and  processing  costs. 

Another  limitation  of  the  analysis  is  the  restricted  technology  domain.  By  omitting  other  surface 
mount  connectors  like  TAB,  epoxy  and  microspheres,  and  capacitive  coupling,  we  cannot  compare 
our  results  to  these  technologies. 

Conclusions 

By  relying  soley  on  device  defect  rates  and  balanced  system  design,  we  have  been  able  to  explain 
several  common  microelectronic  organizational  principles.  One  principle  is  the  ratio  of  transistors 
to  I/O  pins  on  a  chip.  Using  the  same  logic  applied  to  microoptoelectronic  technology,  we  showed 
that  hybrid  optoelectronic  systems  should  have  lO^  transistors  per  optical  I/O  channel.  Monolithic 
optoelectronic  integration  should  have  ten  transistors  per  optical  I/O  channel. 
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We  are  examining  the  integration  of  smart  pixels  in  free  space  digital  optics  (FSDO)  systems 
to  create  advanced  architectures  for  digital  optical  computing.  These  systems  consist  of  large 
arrays  of  electronic  processing  or  switching  nodes  interconnected  by  optical  links.  We  are 
exploring  applications  that  require  very  high  information  input/output  capacity  and  very  high 
spatial  information  density  at  each  processing  module.  FSDO  offers  potential  improvements  by 
increasing  both  the  number  of  I/O  channels  per  chip  and  the  temporal  bandwidth  per  channel. 
Smart  pixel  technology  has  matured  enough  to  provide  device  access  for  practical  research  on  the 
integration  of  these  systems  in  an  optical  system.  This  research  provides  useful  feedback  for  device 
makers,  circuit  designers,  optical  system  designers  and  system  architects. 

We  are  continuing  experimentation  with  the  FET-SEED  chip  we  designed  at  the  FET-SEED 
design  workshop,  organized  by  the  Consortium  for  Optical  and  Optoelectronic  Technologies  for 
Computing  (CO-OP)  and  sponsored  by  AT&T  Bell  Labs  and  ARPA.  The  FET-SEED  technology 
monolithically  integrates  electrical  digital  logic  with  optical  detectors  and  modulators.  The  digital 
logic  is  made  up  of  enhancement-mode  MESFETS  in  a  buffered  FET  logic  (BFL)  configuration. 
Optical  receivers  and  transmitters  are  SEED  devices  that  provide  free-space  optical  I/O  channels 
for  the  digital  circuitry.  All  optical  channels  are  dual-rail  intensity  encoded.  The  chip  contains 
three  2x3  arrays  of  smart  pixel  circuits,  two  five-bit  shift  registers  and  isolated  test  circuits  for 
circuit  characterization.  Each  pixel  in  the  first  array  is  a  one-bit  memory  device  (D-flip-flop) 
where  the  input  data,  output  data  and  clock  signals  enter  or  leave  the  array  in  a  parallel  optical 
fashion.  This  array  is  a  small  example  of  a  2-D  optical  RAM  device.  The  second  array  consists  of 
exclusive-OR  gates.  Two  arrays  of  optical  data,  A  and  B,  enter  the  array  resulting  in  the  output 
optical  array,  C  =  A@B,  where  ©  is  the  exclusive-OR  operation.  The  exclusive-OR  is  a  common 
operation  for  address  matching  and  filtering  in  switching  networks.  In  the  last  array,  each  pixel  is 

a  two-input,  two-output  circuit-switched  bypass/exchange  switch.  Figure  1  shows  a  picture  of  this 

circuit  on  the  chip.  This  pixel  has  two  data  input  and  output  channels,  1  and  2,  and  a  control  line, 
C.  If  the  control  line  input  contains  the  optical  representation  of  a  digital  high,  data  entering 
channel  1  exits  on  channel  2  and  vice-versa  for  channel  2  (the  exchange  operation).  When  the 
control  line  is  low,  data  on  channel  1  exits  on  channel  1  and  likewise  for  channel  2  (the  bypass 
operation).  This  switch  is  the  basic  building  block  for  synchronous  circuit-switched  multistage 
interconnection  networks,  such  as  the  shuffle-exchange  network. 

We  created  the  above  circuits  by  wiring  together  NOR  gates,  optical  receivers  and  optical 
transmitters  previously  designed  and  tested  by  AT&T.  Our  testing  of  the  individual  circuits  gave 
the  results  predicted  by  AT&T  [1,2].  Simulations  show  operation  of  these  circuhs  past  7  GHz  [3]. 
We  tested  the  electronic  logic  circuitry  up  to  1  MHz  and  the  optoelectronic  circuitry  to  10  kHz.  For 
optoelectronic  testing,  we  set  our  optics  on  a  slotted  baseplate  similar  to  the  ones  used  at 
Heriot-Watt,  AT&T  and  Optivision  [4,5].  The  baseplate  allowed  us  to  quickly  and  conveniently 
align  beamsplitters,  lenses,  waveplates  and  diffractive  optic  elements  (DOEs)  to  produce  arrays  of 
5  micron  spot  sizes  in  our  20  |xm^  windows.  Using  this  set-up  we  have  operated  our  shift  register. 
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receiver  and  transmitter  circuits.  So  far  our  most  sophisticated  test  was  to  set  up  an  optical 
communication  link  between  isolated  transmitter  and  receiver  circuits,  shown  in  Fig.  2.  A  pair  of 
optical  beams  reads  the  state  of  a  transmitter  circuit  being  driven  by  an  electrical  digital  signal 
coming  from  a  pin-out.  An  optical  imaging  system  above  the  chip  directs  the  reflected  optical 
beams  from  the  transmitter  back  onto  the  chip  into  a  receiver  circuit.  The  receiver  sends  a 
electrical  digital  signal  out  to  a  pin-out.  Figure  3  shows  the  set-up  for  and  results  from  operating 
our  shift  registers. 

Our  short-term  goal  is  to  connect  optically  the  smart  pixels  to  demonstrate  the  building  blocks 
for  more  complex  circuits.  These  elements  can  be  combined  to  build  an  elementary 
shuffle/exchange  network,  an  address  decoder  or  a  serial-to-parallel  converter.  Our  optical 
systems  use  a  single  objective  lens  to  optically  address  the  smart  pixel  chip.  These  systems  require 
precision  micro-optics  designs  to  overcome  the  conflicting  requirements  of  small  spot  sizes  in  a 
large  field  of  view.  Our  continuing  future  work  is  concerned  with  the  design,  simulation  and 
testing  of  these  systems  to  determine  the  practical  limits  for  future  smart  pixel  chip  designs  and 
optical  smart  pixel  systems. 

We  are  exploring  other  FET-SEED  fabrication  methods  in  a  recent  program  being  pursued 
jointly  with  MIT.  We  have  designed  another  chip  in  which  the  electrical  components  will  be 
fabricated  using  the  MOSISWitesse  facility  and  the  SEED  detectors  and  modulators  will  be 
regrown  onto  the  chip  at  MIT.  The  Vitesse  process  can  combine  both  depletion  and  enhancement 
mode  MESFETs  on  a  GaAs  chip.  This  combination  allows  for  circuits  that  require  only  a  single 
power  supply  voltage,  no  level  shifting,  less  layout  area  and  less  power  consumption. 

For  a  future  smart  pixel  fabrication,  we  would  like  to  include  a  design  for  a  digital  optical 
cellular  image  processor  (DOCIP).  The  DOCIP  cells  are  simple  processing  elements  (PEs)  that 
each  operate  on  individual  pixels  in  an  image.  In  our  proposed  architecture,  each  PE  can  store  two 
bits  and  perform  complement  and  logical-OR  functions.  By  providing  optical  links  between  PEs, 
the  array  can  execute  general  numerical  algorithms  and,  in  particular,  image  processing  routines 
based  on  binary  image  algebra  [6]. 
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Introduction 

The  principle  of  transmitting  information  using  ffee-space  optics  poses  a  serious  challenge  to  collecting 
diagnostic  signals  within  large-scale  digital  photonic  systems*.  High  concentrations  of  parallel  optical  channels  and 
localized  electronic  signals  bring  about  great  difficulties  in  monitoring  high  speed  operations  using  conventional 
contact  techniques.  Up  to  now,  the  typical  diagnostic  procedure  was  to  sample  a  portion  of  the  hght  reflected  from  the 
ou^ut  modulators  with  a  system  viewport  and  form  a  remote  magnified  image.  A  high-sensitivity  photodetector  was 
then  sequentially  aligned  with  each  spot  to  transform  the  signal  to  an  electronic  format  that  could  be  monitored  using 
an  electronic  oscilloscope.  This  sampUng  procedure  and  other  electro-optic  sampling  techraques^’^  developed  for 
high  speed  systems  are  too  time  consuming  when  many  signals  must  be  actively  monitored. 

The  sampling  technique  we  have  implemented  is  based  on  the  concept  of  strobe  photography  whereby  a  subject  is 
iUuminated  for  a  brief  interval  using  a  high  intensity  light  source,  thus  selectively  capturing  an  image  of  an  entire 
system  state.  This  strobe  method  is  an  ideal  match  for  our  modulator-based  photonic  systems.  Within  the  referenced 
system,  a  current  modulated,  semiconductor  laser  with  diffractive  optics  generates  a  two-dimensional  beam  array. 
The  beams  are  imaged  onto  modulators  where  electronically  processed  signals  modify  the  optical  absorption.  The 
modulated  beams  then  serve  to  transmit  information  to  the  subsequent  opto-electronic  device  array.  Current  system 
prototypes  have  physically  separated  paths  for  the  incoming  and  outgoing  data  making  it  possible  monitor  the 
processed  signals.  Thus,  in  order  to  selectively  examine  the  temporal  evolution  of  the  optical  signal  reflected  from  the 
modulators,  a  set  of  pulsed  readout  beams  (acting  as  strobes)  must  be  introduced  to  slowly  scan  through  a  repeated 
pattern  embedded  in  the  data  stream.  By  scanning  slowly  enough,  the  resultant  output  can  be  captured  by  a  video 
camera. 

The  system  operation  resembles  that  of  a  high-speed,  multichannel  oscilloscope.  The  system  probe  is  composed 
of  a  standard  video  camera  fliat  is  able  to  coUect  an  image  of  a  large  number  of  optical  channels  during  each  sample 
interval.  As  a  demonstration  of  the  system  capability,  we  electrically  drove  a  modulator  array  at  0.5  to  4.0  gigabit  per 
second  data  rates  and  recorded  the  optical  waveforms. 

Hardware 

The  primary  components  of  the  multichannel,  optical  oscilloscope  are  shown  in  figure  1.  The  electronic 
components  of  the  system  serve  to  generate  a  repeated  data  pattern  at  the  modulator  and  to  synchronize  an  optical 
probe  pulse  that  is  slowly  scanned  in  time  relative  to  the  pattern.  Optical  and  video  components  are  responsible  for 
generating  and  imaging  the  optical  pulses,  sampling  a  portion  of  the  output  light,  and  digitizing  the  image. 

The  subject  to  be  examined  is  typically  a  high-speed,  electronic  processing  circuit  with  integrated  multiple 
quantum  well  modulators.  The  data  generator  module  produces  either  an  optical  or  electrical  signal,  that  at  the 
minimum,  provides  a  clocking  mechanism  for  synchronizing  activity  within  the  circuit.  The  laser  and  associated 
beam  array  generator  create  an  array  of  beams  that  are  imaged  onto  the  modulator  windows.  Under  normal  system 
operation,  the  laser  generates  an  uninterrupted  intensity  modulated  square  wave  that  is  synchronized  with  each  data 
bit  as  it  is  presented  at  the  modulator. 

The  synchronization  is  set  so  that  a  probe  pulse  is  generated  at  1/Nth  of  the  frequency  of  the  N  bit  pattern  with  a 
pulse  whose  duration  is  no  more  than  a  few  hundred  picoseconds.  This  pulse  is  slowly  delayed  such  that  it  samples 
the  entire  data  pattern  over  a  period  of  a  few  seconds.  The  delay  sequence  is  limited  by  the  speed  of  the  video 
acquisition  system  and  is  about  10  to  15  samples  per  second  in  our  implementation. 
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Software  Application 

An  Apple  Macintosh  Quadra  840AV  was  chosen  as  the  application  platform  for  controlling  the  oscilloscope.  The 
application  software  can  be  separated  into  two  basic  modules:  the  video  acquisition  and  analysis  routines,  and  the 
oscilloscope  display  routines.  Each  set  provides  a  user  interface  for  adjusting  parameters  and  options. 

The  video  module  is  responsible  for  digitizing  and  storing  a  video  frame,  extracting  the  intensity  values  from  the 
designated  regions  of  interest,  and  controlling  the  synchronization  of  the  data  and  probe  signal  generators.  In  this 
implementation,  the  video  digitization  is  highly  integrated  with  the  workstation.  Synchronization  can  either  be 
controlled  by  the  processor  and  communicated  to  a  programmable  delay  generator  using  the  general  purpose 
interface  bus  (GPIB)  or  implemented  by  tightly  coupling  the  operation  of  both  signal  generators. 

The  user  interface  permits  creation  and  manipulation  of  regions  of  interest  in  the  video  frame.  Using  an 
interactive  cursor,  the  user  either  selects  an  arbitrarily  distributed  set  of  regions  or  defines  an  array  of  regularly 
spaced  regions.  The  region  size  can  be  adjusted  from  a  single  pixel  to  an  arbitrary  size  square  of  pixels.  To  aid  the 
user  in  accurately  locating  the  region,  a  zoom  feature  can  display  a  magnified  region  surrounding  the  selection  point. 

The  intensity  waveforms  are  presented  in  a  manner  similar  to  oscilloscope  displays.  Once  the  selected  regions  are 
specified,  the  waveform  are  displayed  as  an  array  of  scans  or  overlaid  on  a  common  plot.  The  time  scale  and  vertical 
axis  of  the  scan  region  are  user  adjustable.  Auto-scaling,  triggering  and  data  storage  functions  are  also  provided. 

Demonstration 

To  demonstrate  the  capabilities  of  the  multichannel  optical  sampling  oscilloscope,  a  2x4  array  of  independent 
electrically  driven,  differential  modulators'^  was  monitored  while  operating  at  gigahertz  rates.  The  synchronization 
between  the  data  signals  and  the  probe  pulse  was  fixed  by  using  two  frequency  stabilized  analog  signal  generators 
synchronized  to  a  common  clock  to  trigger  digital  data  and  pulse  generators.  In  figure  2,  four  of  the  differential 
modulators  were  driven  by  a  data  generator  (16  bit  words)  at  1  Gb/s,  and  four  were  driven  by  IGHz  square  waves  (2 
Gb/s  1, 0,1,0  pattern).  The  voltage  on  the  modulators  varied  over  a  3.3V  swing  which,  coupled  with  the  shift  in 
operating  wavelength  caused  by  heating  from  nearby  50i2  terminating  resistors,  led  to  a  poor  contrast  ratio  between 
on  and  off  states.  The  data  generator  was  triggered  at  a  bit  rate  of  1,000,000,002  Hertz,  while  the  probe  pulse  operated 
at  frequency  of  62.5  MHz.  When  the  probe  pulse  is  thus  scanned  through  the  data  pattern  at  a  rate  of  about  2  bits  per 
second  and  sampled  about  10  times  per  second,  the  sample-to-sample  offset  is  about  200  ps.  The  modulators  were 
operated  at  data  rates  from  0.5  to  4  Gb/s  (limited  by  the  signal  generator)  throughout  which  the  oscilloscope 
responded  with  similar  results.  The  1  and  2  Gb/s  data  are  presented  in  figure  2  since  they  shows  more  sharply  defined 
edges  than  the  higher  speed  waveforms.  Although  only  16  waveforms  were  available,  a  larger  number  of  modulators, 
say  16x16,  could  be  monitored  with  equivalent  performance. 

Figure  2  shows  the  application  software  in  operation.  Two  video/image  frames  show  the  illuminated  modulator 
array  and  the  region  selection  window.  Sixteen  modulator  waveforms  obtained  from  the  sampled  video  signal  are 
shown  in  the  rightmost  window.  For  comparison,  the  intensity  waveform  of  one  modulator  obtained  from  a  high¬ 
speed  photodetector  is  shown  on  the  bottom  left.  One  can  see  that  the  IGb/s  data  has  fast  edges  similar  to  the 
photodetector  oscilloscope  scan.  In  addition,  a  video  signal  of  system  operation  was  collected  using  a  video  tape 
recorder  and  analyzed  by  the  optical  oscilloscope,  illustrating  a  means  of  storing  diagnostics  for  later  analysis. 
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Figure  1.  Schematic  of  electronic  and  video  hardware  modules. 
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Figure  2.  Demonstration  of  1  and  2  Gb/s  multichannel  operation  showing  oscilloscope  interface. 
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Introduction 

Development  of  robust  and  reliable  optical  systems  is  essential  in  order  to  utilize  the 
connectivity  and  parallelism  of  optics  in  conjunction  with  electronics  in  smart  pixel  information 
processors.  Bulk  optical  imaging  systems  utilizing  custom  and  off-the-shelf  optics  and 
optomechanics  can  provide  some  solutions  to  optical  interconnections  in  laboratoty  experiments  and 
system  demonstrations.  However,  there  are  optical  and  size  limitations  to  classical  imaging  techniques 
that  can  be  overcome  with  the  use  of  hybrid  bulk  and  micro  optic  imaging.  Use  of  large  arrays  of 
microlenses  is  an  effective  method  of  interconnecting  large  dilute  arrays  of  smart  pixels.  The  micro 
channel  technique  for  4-f  imaging  of  focal  spot  arrays  and  device  planes  establishes  a  single  optical 
path  for  each  charmel  in  the  array.  This  type  of  one-to-one  imaging  may  be  usefuUy  implemented  in 
various  imaging  systems.  In  addition  to  simple  one  to  one  imaging,  arrays  of  focal  spots  originating 
from  different  sources  must  be  combined  together.  For  example,  signal  inputs  incident  a  smart  pixel 
array  must  be  combined  with  the  clock  array  that  is  used  to  read  the  state  of  the  devices.  We  have 
investigated  bulk  and  microoptic  components  and  subsystems  to  be  applied  to  optical  computing 
applications.  This  has  involved  study  of  the  practical  and  theoretical  performances  of  the  various 
components.  The  progression  of  our  work  in  implementing  free-space  smart  pixel  imaging  systems 
establishes  the  techniques  that  will  utilize  micro  optical  components  in  practical  system  subassemblies. 

Beam  combination  using  space  multiplexing 

Previous  work  [1,2]  has  utilized  the  method  of  space  multiplexing  in  free-space  imaging 
systems  interconnecting  arrays  of  (SEED)  devices.  Figure  1  schematically  shows  the  method  used  for 
interlacing  two  arrays.  Patterned  micro-mirrors  are  used  in  the  image  plane  to  spatially  divide  the 
array  and  polarization  state  is  used  to  separate  the  input  and  output  beams.  Experimentally  we  have 
found  that  imperfections  on  the  surfaces  of  image  plane  components  can  have  a  significant  effect  on 
the  uniformity  of  the  incident  array.  In  addition,  non-uniform  power  loss  can  be  caused  by  clipping 
at  the  edges  of  the  micro-mirrors.  Practical  investigation  of  tiie  use  of  space  multiplexing  in  this 
manner  has  established  that  it  can  be  a  effective  method  of  beam  combination  with  no  significant 
addition  to  the  power  nonuniformity  if  the  focal  spots  to  be  multiplexed  are  spatially  separated  from 
the  micro-mirror  edges  and  the  components  are  clean  and  flawless.  In  a  smart  pixel  system,  the 
optical  windows  can  be  designed  to  be  spatially  separated  in  a  dilute  array  and  hence  this  is  an 
appropriate  configuration  to  use.  This  is  in  contrast  to  the  use  of  space  multiplexing  of  two  focal 
spots  onto  two  halves  of  a  5x1  Opm  S-SEED  windows  where  the  two  spots  are  placed  side  by  side  after 
combination  at  the  edges  of  the  micro-mirrors. 

Space  multiplexing  with  microlenses 

Given  that  space  multiplexing  is  a  viable  method  of  beam  combination  in  particular  system 
configurations,  the  progression  is  to  utilize  the  attributes  of  microlenses  to  provide  compact  and  stable 
configurations.  Figure  2  shows  a  schematic  of  a  microlens  version  of  the  beam  combination  shown  in 
Figure  1  (note  that  the  diagrams  are  shown  to  different  scale,  the  beamsplitter  cubes  are  equivalent 
size).  It  is  clear  that  the  short  focal  lengths  reduce  the  subsystem  footprint  significantly.  In  addition, 
if  the  optical  components  are  integrated  together,  Fresnel  losses  are  reduced.  On-axis  imaging  with 
microlenses  eliminates  the  problem  of  reduced  performance  of  off-axis  field  points,  and  the  array  size 
is  not  limited  by  the  field  curvature,  a  problem  commonly  encountered  in  bulk  imaging  systems.  An 
issue  that  must  be  considered  in  the  use  of  microlenses  is  clipping  at  the  microlens  aperture  and  the 
gaussian  propagation  of  the  microbeams  [3].  McCormick  et  al.  [4]  have  shown  that  the  effect  of 
clipping  can  be  significant  on  the  system  tolerances.  Our  work  has  also  confirmed  that  the  most 
appropriate  region  of  operation  in  the  type  of  system  we  are  interested  in  is  that  which  the  gaussian 
beam  is  effectively  undipped  (less  than  5%  of  the  power  faUs  outside  the  aperture  boundary). 
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Required  attributes  of  the  microlenses  are  low  aberrations,  Wgh  efficiency  Md  good  umformity. 
Each  microlens  is  used  on-axis  so  the  aberration  should  be  limited  to  spherical  for  rotationally 
symmetric  elements.  Spherical  aberration  accumulated  over  several  microlens  passes  can  introduce 
significant  power  loss.  This  is  due  to  poor  coupling  of  the  aberrated  rays  in  the  optical  windows  and 
thus  should  be  kept  to  a  minimum.  The  number  of  passes  required  is  also  an  issue  m  terms  of 
reflection  and  absorption  losses.  Another  issue  that  will  have  a  significant  effect  on  system 
performance  is  the  array  uniformity.  Variation  of  the  effective  focal  lengths  across  the  array  has 
important  system  implications.  If  the  images  formed  by  each  microlens  are  not  in  the  same  plane 
there  will  be  a  resultant  effect  on  the  array  uniformity  at  the  device  plane.  We  have  measured  the 
focal  length  uniformity  of  various  arrays  and  found  their  variation  to  be  5  to  20%.  Large  focal 
length  tolerances  will  reduce  the  tolerances  available  for  other  parameter  vanation.  This  is  of 
consequence  when  establishing  the  requirements  for  other  system  components  and  their  construchon. 
These  variations  in  parameter  tolerances  of  available  microlens  arrays  must  be  considered  explicifly  m 
the  architecture  choices  and  desi^  of  systems  that  utilize  them. 

Space  multiplexing  with  microlenses  ,  •  ui  .4 

In  addition  to  consideration  of  the  issues  outlined  above,  it  is  necessary  to  develop  reliable  and 
physically  robust  techniques  that  will  allow  the  implementation  of  these  methods  in  practical  systems. 
The  interconnection  density  required  by  the  type  of  smart  pixel  system  that  we  are  interested  in 
demands  use  of  100-250pm  diameter  and  pitch,  low  f-number  microlenses,  each  used  on-axis 
providing  near  diffraction  limited  imaging.  Gaussian  beam  propagation  must  be  considered  when 
apertures  of  this  magnitude  are  used.  The  effects  of  diffraction  are  further  compounded  by  the 
requirements  for  spacings  in  "collimated"  space  that  will  accommodate  the  bulk  components 

(beamsplitter  cube  and  retardation  plates).  .  .  ,  „  .  ,  *  u 

A  generic  implementation  of  this  microlens  imaging  utilizes  bulk  optical  components  such  as 
retardation  plates  and  beamsplitter  cubes  along  with  arrays  of  microlenses.  Arrays  of  micro 
components  such  as  patterned  micro-mirrors  or  spatially  variant  micro-retardation  plates  may  be  used 
in  conjunction  to  provide  the  desired  interconnection.  Figure  2  shows  a  schematic  of  one  tyi»  of 
generic  system  that  combines  arrays  of  beams  together  using  space/polarization  multiplexing,  i  his 
configuration  would  be  useful  for  (theoretically  lossless)  combining  up  to  four  foc^  spot  arrays  with 
a  single  component.  It  can  be  seen  that  the  optical  paths  traversed  by  each  of  the  micro  channel  routes 
(a  single  channel  in  each  array  is  shown  for  clarity)  is  via  different  arrays  of  microlenses  as  illustrated 
in  the  figure.  The  optical  paths  laid  out  in  Figure  3  illustrate  there  are  common  components  wimn 
each  micro  channel  route,  though  the  optical  system  encountered  by  each  spot  array  being  imaged  is 
different.  This  factor  requires  careful  consideration  in  the  system  design.  Analysis  of  the  nominm 
optical  system  establishes  the  issues  that  must  be  considered  in  order  to  approach  practical  and 
tolerant  solutions.  Further  detailed  analysis  of  the  specific  system  requirements  provides  essent^ 
information  for  the  implementation  of  these  component  technologies.  In  a  simplified  system,  the 
minimum  tolerance  issues  that  must  be  considered  in  the  construction  of  a  single  monolithic,  hybrid 
component  can  be  separated  into  three  areas: 

1 )  Errors  in  microlens  fabrication  .  ,  j  » 

The  measured  (or  specified)  non-uniformity  of  focal  lengths  of  the  microlenses  can  be  used  to 
express  the  lens  fabrication  errors.  The  "worst  case"  establishes  the  system  limitations.  Measurement 
of  the  exact  parameters  (element  thickness  and  curvature)  is  difficult,  so  a  single  figure  that  accounts 
for  the  non-uniformity  in  terms  of  the  tolerance  on  the  element  curvature  may  be  used. 

2)  Errors  in  beamsplitter  fabrication  .... 

Custom  or  off- the  shelf  beamsplitter  cubes  are  specified  in  terms  of  their  dimension  and 

parallelism.  The  linear  dimension  tolerances  translate  to  longitudinal  errors  in  microlens  spacing. 
The  parallelism  of  the  cube  is  determined  by  the  angular  accuracy  of  the  45  degree  prisms  they  are 
made  from.  This  error  results  in  tilt  of  the  microlenses  if  they  are  to  be  affixed  to  the  beamsplitter 
face. 

3)  Errors  in  hybrid  component  construction 

Fabrication  of  the  hybrid  unit  can  result  in  two  major  errors:  microlens  decenter  and 
longitudinal  error  of  focal  plane  array  elements  (for  example,  patterned  micro-mirrors). 

An  example  of  the  raytraced  tolerance  analysis  output  is  the  plot  in  figure  4.  This  shows  the 
minimum  detector  radius  that  must  be  used  for  a  2|im  radius  input  spot  when  decenter  tolerances  are 
applied  to  all  of  the  different  optical  paths  in  a  200|im  pitch,  f/1,  telecentric  beam  combination  system 
as  outlined  above.  This  type  of  data  is  used  in  determining  the  system  design  choices  and  trade-otts 
with  other  system  components. 
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This  paper  discusses  the  development  of  current  techniques  and  issues  that  must  be  considered 
in  the  tolerance  and  performance  analysis.  The  results  from  these  analyses  are  used  directly  in  the 
development  of  fabrication  methods  that  will  allow  successful  use  of  the  technology  to  provide 
practical  system  components. 
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Figure  1.  Schematic  of  bulk  space  multiplexing 

Figure  2.  Schematic  of  generic  optical  beam  combining  unit  (note  scale  relative  to  Figure  1) 
Figure  3.  Unfolded  optical  paths  for  figure  2  showing  common  components 
Figure  4.  Calculated  required  detector  radius  for  one  particular  configuration 
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Introduction: 

For  optical  processing  systems,  the  split  and  join  operations  are  the  most  frequently  needed 
operations.  For  an  optical  implementation  of  these  operations  a  series  of  properties  are 
favourable:  The  optical  system  should  be  small,  modular,  wavelenght  tolerant  and  require  a 
minimum  number  of  different  fabrication  technologies. 

In  the  following  we  demonstrate  a  simple  system  which  satisfies  all  of  these  demands.  The 
system  consists  of  two  identical  modules,  each  module  consists  of  four  identical  lenses  and  two 
identical  prisms.  In  this  way,  the  total  number  of  different  components  is  two.  Since  the  lenses 
are  part  of  an  array,  we  only  have  to  handle  two  components  per  module:  one  microlens  array 
and  one  symmetric  prism.  Th  system  can  be  operated  in  two  modes:  the  shuffle  mode  allows 
the  combination  of  four  different  input  planes  in  one  output  plane,  the  multiplex  mode 
generates  four  identical  copies  from  one  input  plane.  Both  of  these  modes  are  necessary  for  the 
realisation  of  symbolic  substitution  algorithms  [1],  The  operating  mode  can  be  chosen  by  the 
lateral  position  of  the  input  plane(s)  and  the  angular  spectrum  of  the  light  source. 

Standard  module:  Each  standard  module  consists  of  four  planar  gradient-index  microlenses 
(PML)  and  two  prisms  which  are  realized  as  one  symmetric  prism.  The  focal  length  of  the 
PML  is  chosen  to  be  equivalent  to  the  thickness  of  the  glass  substrate.  The  lenses  have  been 
produced  by  the  Na-Ag-ion-exchange,  the  prisms  by  thermal  molding  and  casting.  Details  of 
both  processes  have  been  reported  previously  [2] [3].  Typical  dimensions  are.  focal  length 
f=1500nm  (in  glass),  lens  diameter  d=200pm,  lens  pitch  p=300pm,  prism  angle  5=8°.  Note 
that  it  is  possible  to  obtain  a  required  (de)magnification  of  the  input  plane(s)  by  suitable  choice 
of  the  design  parameters.  This  is  important  if  emitter/receiver  arrays  of  different  pitches  are 

used. 


Fig  1  Standard  module 
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Cascaded  system:  One  standard  module  alone  can  be  used  to  perform  the  one-dimensional 
overlay  of  dataplanes.  This  has  been  demonstrated  in  [2],  If  a  second  module  is  used  in  a  way 
that  both  prisms  face  each  other  and  the  second  prism  is  rotated  by  90°  with  respect  to  the  first 
prism,  it  is  possible  to  perform  a  two-dimensional  overlay  of  four  input  planes.  The  bottom 
part  of  fig.  3  illustrates  that  it  is  possible  to  use  the  system  backwards.  In  this  case,  four 
identical  copies  of  the  input  plane  are  produced.  Note  that  the  angular  spectra  are  different  in 
the  input  and  output  plane. 
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Fig.  3  Experimental  setup 
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Experimental  setup:  Both  modes  of  operation  are  demonstrated.  The  input  device  is  a 
commercially  available  LCD-panel  (from  a  Epson  VP- 1  OOPS  video  projector)  which  is 
controlled  by  a  computer.  Since  the  pixel  size  is  too  large  (60pm  x  70pm)  for  use  in  a 
microoptical  system,  it  is  demagnified  by  a  factor  10  with  a  microscope  objective.  This 
demagnified  image  is  the  input  for  the  microsystem.  The  image  of  the  illumination  mask  then 
has  to  be  located  between  the  planes  of  the  microlenses.  Since  this  has  to  be  done  with  the 
same  microscope  objective  as  is  used  for  the  demagnification  of  the  input  plane,  it  is  necessary 
to  adjust  the  focal  lengths  of  the  condenser  lenses  accordingly.  The  output  plane  is  magnified 
on  a  CCD-camera  with  a  second  microscope  objective.  The  CCD-camera  is  connected  to  a 
frame  grabber  and  computer. 
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Fig  4a  Input  for  shuffle  mode 


Fig  4b  Shuffle  output 
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Fig  4  shows  the  input  and  output  plane  for  the  shuffle  mode  operation,  illustrating  that  an  array 
of  16  X  16  spots  can  be  resolved  reliably  with  two  cascaded  modules.  Fig  5  shows  input  and 
output  planes  for  the  multiplex  mode.  The  original  size  of  the  input  and  output  planes  is 
700pm  X  500pm.  For  the  shuffle  mode,  a  magnification  by  a  factor  of  2  was  used,  for  the 
multiplex  mode  a  system  with  a  demagnification  of  1/4  was  used. 


Fig  5a  Input  plane  for  multiplex  mode 


Fig  5b  Output  plane  of  multiplexing  system 
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The  fabrication  of  miniaturized  refractive  prisms  for  the  use  in  free  space  optical  systems  is  a 
relatively  new  field  in  component  fabrication*’^’^.  Thermal  imprinting  in  PMMA  has  reduced 
flexibility  in  the  angle  choice  and  induces  stress  in  the  substrate.  The  optical  quality  of  the 
surface  is  given  by  the  quality  of  the  master*.  Synchrotron  or  proton  irradiation  of  PMMA  gives 
the  advantage  of  arbitrary  deflection  angles  of  prisms  with  depths  of  several  hundred  microns. 
Problems  arise  from  the  fact  that  the  roughness  of  a  thick  metal  mask  (>20/tm)  is  directly 
copied  into  the  prism  surface?’\  This  paper  describes  a  new  fabrication  technique  for  deflection 
elements  whose  surface  quality  is  better  than  the  mask’s:  the  surface  roughness  is  better  than  20 
nm.  The  method  is  based  on  deep  proton  irradiation;  the  new  idea  is  that  during  irradiation  the 
sample  is  moved  relative  to  a  feed  mask.  By  this  a  polishing  effect  of  protons  in  the  prism 
surface  occurs  yielding  the  above  mentioned  smoothness. 


The  process  is  based  on  the  method  of  deep  proton  irradiation  with  development  of  the  irradiated 

structures^.  _ 

Additionally  to  the  irradiation  now  the 


PMMA  sample  is  moved  under  its  mask 
during  irradiation.  The  mask  however  is 
installed  in  a  fixed  position  relative  to  the 
beam  (Fig.  1).  As  depicted  in  Fig.  1  a 
circular  aperture  moving  over  the  sample  can 
be  used  to  create  a  variety  of  structures  which 
can  all  be  situated  onto  the  same  block. 
Moving  the  sample  several  times  along  the 
same  path  causes  that  roughness  of  the  mask 
is  spatially  blurred  in  the  PMMA  substrate. 
The  expected  surface  profile  can  be  seen  from 
a  simulation  assuming  a  perfect  circle  of  125 
fim  diameter  moving  up  and  down  along  a 
straight  line.  The  curve  in  Fig.  2  shows  lines 
of  equal  dose  depositon  (in  steps  of  100  J/cm^ 


PMMA 


piask 

single  hole 


motor  driven  target 
(x,y,  rotation) 


irradiated  structure 


Fig.  1  Irradiation  setup  with  movement  of  the 
sample  behind  the  mask 
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from  100  J/cm^  to  2100  J/cm^  decreasing 
from  left  to  right)  representing  the  expected 
vertical  shape  of  the  PMMA  structure.  The 
simulation  is  performed  for  a  500  ^m  thick 
PMMA  substrate  irradiated  with  a  proton 
energy  of  7  MeV  and  a  dose  of  0.6*10** 
ions/cm^  using  a  Monte-Carlo  simulation 
method*. 


The  smoothing  effect  of  this  method  can  be 
seen  most  obvious  for  a  mask  of  poor  quality. 
So  in  the  experiment  shown  in  Ae  following 
a  mechanically  drilled  hole  was  used  as  mask. 
The  hole  is  situated  in  250  fim  thick  copper 
plate  with  an  aperture  diameter  of  125  jim. 
(The  choice  of  this  diameter  respects  the 


distance  from  edge  [pm] 


Fig.  2  Simulation  of  dose  distribution  for  a 
straightly  moving  sample  behind  a  circular 
aperture  of  125  pm  diameter. 


option  of  a  monolithic  integration  with  single 

mode  fibers  and  microlenses^).  The  tolerances  in  the  hole  diameter  as  well  as  the  mask 
roughness  are  in  the  range  of  +/-  5  pm.  During  irradiation  the  PMMA  was  moved  behind  that 
mask  straightly  several  times  forward  and  backward  over  a  distance  of  500  pm  using  high 
precision  motor  drives.  The  exact  dose  in  the  irradiated  volume  is  controlled  by  the  current 
introduced  from  the  protons  entering  an  isolated  sheet  of  metal  betod  the  PMMA  and  was 
3*10'*C/mm  (detected  charge  per  displacement  of  the  motor  drive).  After  irradiation  a 
development  follows  yielding  vertical  slits  with  a  width  corresponding  to  the  size  of  the  mask 


aperture. 

An  interferometric  measurement  of  the  vertical  structure  is  given  in  fig.  3.  The  curvature  at  the 
top  and  at  the  bottom  of  the  structure  are  caused  by  the  lateral  straggling  of  the  protons  at  the 
bottom  and  by  the  imperfect  shape  of  the  drilled  mask  at  the  top.  Paying  attention  to  the  central 
region  which  is  zoomed  in  fig.  4  it  can  be  seen  that  there  the  flatness  is  better  than  0.5  pm.  A 
horizontal  line  scan  is  shown  in  fig.  5.  The  roughness  (o^  )  is  better  than  20  nm. 


The  method  of  writing  and  polishing  deep  structures  in  PMMA  opens  a  potential  for  building 
integrated  microoptical  setups.  For  example  if  the  mask  diameter  is  adapted  to  the  diameter  of 
optical  single  mode  fibers  (e.g.  125  pm)  other  optical  elements  like  integrated  fiber-lens  holders 
can  be  fabricated  together  in  one  block  with  miniaturized  refractive  prisms  and  beam  splitters. 
Therefore  complex  miniaturized  refractive  setups  can  be  constructed  in  an  integrated  form  (eg. 
perfect  shuffle,  star  couplers,  interferometers)  which  offers  compact  size  and  reduced  degrees  of 
alignment  freedom. 
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Fig.  3  Profile  of  the  500  fim  deep  vertical  Fig.  4  Zoom  of  the  central  region  of  the 
surface.  vertical  surface 


Fig.  5  Horizontal  scan  of  surface  profile 
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The  growing  complexity  and  ever-increasing  miniaturization  of  optical  systems  has  made  the  development  of 
sophisticated,  flexible  and  compact  interconnection  components  both  necesipry  and  urgent  [1-3].  The  funcuonahty 
of  conventional  DOEs  can  be  increased  by  making  them  polarization-selective,  i.e.  anisotropic.  Recently  Ford  et  al 
described  the  fabrication  and  characterization  of  anisotropic  DOEs  (ADOEs)  in  LiNbOs-substrates  using  ion 
milling.  However,  to  make  ADOEs  more  generaUy  useful,  it  would  be  desirable  to  use  a  material  which  is  both 
cheaper  and  easier  to  process.  Here  we  demonstrate  that  calcite  is  an  attractive  alternative,  with  several  advantages 
over  LiNb03. 

For  a  conventional  DOE  the  optical  phase  function  is  obtained  by  etching  a  relief  pattern  in  an  isotropic  substrate.  In 
the  case  of  an  ADOE  two  desired  phase  functions  are  generated  by  means  of  two  etched  substrates,  of  which  at  least 
one  is  anisotropic,  joined  together  at  their  etched  surfaces.  This  configuration  is  shown  in  figure  1,  where  n©,  ne, 
and  ng’  are  respectively  the  ordinary  and  extraordinary  refractive  indices  of  the  first  and  second  substrate  material, 
ng  is  the  refractive  index  of  the  gap  material;  di  and  d2  are  the  etching  depths  of  the  first  and  second  substrate,  and 
({iTE  and  ([iTM  are  the  relative  phases  of  the  orthogonally  polarized  beams  after  passing  through  the  ADOE. 


Figure  1.  a.  Schematic  representation  of  the  configuration  of  the  ADOE. 

b.  Photograph  of  an  anisotropic  lens,  as  seen  through  an  axicon  microscope. 

The  etch  depths  are  given  by:  [4] 

d  X  (ne'-ng).(|>TE-(no’-nR)-‘t>TM  (2.1) 

‘  2ii  ■  (n^  -  ng  ).(no'  -Ug  )  -  (n^'  -ng  ).(no  -  ng  ) 

^  (He  •  Hg  ).(|>TE  -  (Hq  -  ng  ).<|)tm 

^  2Jt‘(ne'-ng).(no-ng)-(ne-ng).(no'-ng) 


(2.2) 
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For  ease  of  fabrication  and  to  remain  within  the  geometrical  optics  approximation  small  etch  depths  are  preferred. 
This  can  be  achieved  by  using  two  identical  highly  birefringent  substrates,  with  optical  axes  mutually  perpendicular 
(no’=ne,  ne’=no)  and  normal  to  the  incident  beam.  Calcite  was  chosen  because  of  its  high  birefringence  (no=1.66, 
ne=1.49,  An=0,17  for  X=633nm  [5]),  relatively  low  price  and  ready  availability  and  because  it  can  be  processed 
using  wet  etching  technology.  Moreover,  since  its  refractive  indices  are  close  to  1,  reflection  losses  will  be  small. 

As  di  and  d2  are  functions  of  both  <1)TE  and  <t>TM.  N  discrete  phase  values  require  etch  depth  levels.  To 
minimize  the  number  of  processing  steps  we  have  designed  binary  phase  elements  which  require  4  etch  depths  (and 
therefore  2  masks)  for  each  substrate.  The  etch  depths  in  this  case  are  given  in  table  1. 


Phase  functions 

Etching  depths  I 

<t>TE 

<t>TM 

di  (iim) 

d2  (lim) 

0 

0 

1,06  =  1,06  +  0 

1,06  =  1,06  +  0 

6 

n 

1,85  =  1,06  +  6,79 

6=0  +  6 

It 

0 

0=6+0 

1,85  =  1,06  +  6,79 

K 

n 

0,79  =  0  +  6,79 

6,79  =  6  +  6,79 

Table  1.  Etch  depths  (di,  d2)  of  the  calcite  substrates  when  air  is  used  as  gap  material. 

Equations  2. 1  and  2.2  can  be  rewritten  as 

di  =Ci.(l)xE  +C2.(t>TM  (2.3) 

d2  =C2.<!)tE+Ci.0xm  (2,4) 

where  Ci  and  C2  represent  positive  constants,  and  For  each  substrate  the  first  mask  is  simply  the 

desired  binary  phase  function  while  the  second  is  the  inverse  of  the  complementary  phase  function. 

The  patterns  are  plotted  and  then  photoreduced  to  fabricate  the  masks.  Photolithography  is  used  to  pattern  the 
photoresist  coating  on  the  calcite  substrates.  Transfer  to  the  calcite  substrates  is  done  by  wet  etching  in  a  1/1000 
HCl-solution  at  an  average  etching  rate  of  382  A/sec.  After  etching  the  substrates  are  aligned  to  one  another  using  a 
standard  mask  aligner  and  then  permanently  joined  together  using  UV-curing  optical  cement 
Since  the  obtained  phase  functions  are  the  result  of  the  combination  of  both  surface  profiles,  the  alignment  of  the  two 
masks  for  each  substrate,  and  of  the  two  etched  substrates  will  be  critical.  Therefore  the  resulting  efficiency  will  be 
strongly  dependant  on  the  performance  of  the  mask  aligner  which  is  used. 

Anisotropic  Fresnel  lenses  and  gratings  have  been  fabricated  and  characterized  experimentally.  Important  features 
are  the  diffraction  efficiency  and  the  contrast  ratio.  The  diffraction  efficiency  is  defined  as  the  fraction  of  the 
incident  light  that  diffracts  into  a  predefined  area.  In  the  case  of  the  Fresnel  lenses  this  area  is  a  circle  with  a 
diameter  of  200pm  and  in  the  case  of  the  gratings  it  are  two  circles  the  same  size  as  the  incident  beam.  The  contrast 
ratio  at  each  of  the  two  desired  images  is  defined  as  the  ratio  between  the  intensities  at  that  image  for  the  two 
orthogonal  polarizations. 

We  have  measured  an  average  diffraction  efficiency  of  1 1  %  and  a  contrast  ratio  of  over  5: 1  for  a  Fresnel  lens  which 
focuses  one  polarization  at  a  distance  of  20cm  and  the  other  at  60cm,  Lenses  with  a  smaller  f/#  have  also  been  made 
but  have  both  lower  efficiencies  and  contrast  ratios.  This  is  probably  due  to  the  fact  that  these  elements  have  smaller 
feature  sizes  such  that  fabrication  errors  (misalignment  of  the  masks,  under-etching)  become  more  important. 
Because  in  the  case  of  an  anisotropic  lens  the  two  desired  images  are  partially  overlapping  (both  on-axis  and 
focusing)  these  elements  are  not  well  suited  to  measurement  of  the  contrast  ratio.  Therefore  diffraction  gratings 
(with  orthogonal  grating  vectors  on  the  substrates)  were  also  fabricated.  We  have  measured  an  average  efficiency  of 
27%  and  a  contrast  ratio  of  over  110:1  for  an  anisotropic  grating  with  a  period  of  136|im.  Other  gratings  with 
smaller  periods  have  been  made  but  have  lower  efficiencies  and  contrast  ratios  (as  is  the  case  with  Fresnel  lenses 
with  smaller  f/#). 

These  anisotropic  gratings  show  great  potential  for  use  in  systems  where  variable  routing  and  interconnection  are 
important  (e.g.  in  optical  computing).  This  is  illustrated  by  the  system  shown  in  figure  2a  which  consists  of  a  liquid 
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crystal  polarization  modulator  and  grating  #4.  The  beam  can  be  deHected  to  two  different  pairs  of  points  by 
ch^ging  the  amplitude  of  the  voltage  applied  to  the  liquid  crystal  modulator.  We  have  experimentally  charactenz^ 
this  system  by  simultaneously  measuring  the  intensity  in  each  diffraction  order  while  the  voltage  on  the  liquid  ciystal 
is  modulated.  The  results  are  given  in  figure  2b,  showing  the  high  contrast  ratios  obtained  here.  The  modulauon 
speed  in  this  system  is  limited  by  the  modulation  speed  of  the  liquid  crystal  (up  to  lOkHz  in  the  case  of  a 
ferroelectric  liquid  crystal  [6]). 


Figure  2.  a.  Variable  interconnection  system  consisting  of  a  liquid  crystal  polarization 
modulator  and  an  anisotropic  grating, 
b.  Top-.  Amplitude  of  the  2kHz  AC-voltoge  applied  to  the  liquid  crystal. 

Middle-.  Intensity  in  one  of  the  TE-generated  spots. 

Bottom:  Intensity  in  one  of  the  TM-generated  spots. 

We  have  also  investigated  the  potential  of  anisotropic  DOEs  for  the  implementation  of  more  complex  fan-out  and 
interconnection  operation.  Two  64*64  pixel  binary  phase  computer  generated  holograms  (CGHs)  were  designed  by 
use  of  the  simulated  annealing  algorithm  [7]  to  diffract  light  into  14*15  off-axis  orders.  These  were  fabncated  in 
calcite  as  a  single  ADOE  with  a  25jim  pixel  size.  A  contrast  ratio  of  2,4  was  obtained,  togethw  with  a  diffracUon 
efficiency  of  9,5%.  These  relatively  low  values  are  due  to  misalignment  between  the  two  substrates.  Further  results 
in  this  area  will  be  presented. 

We  have  demonstrated  the  fabrication  of  highly  polarization-selective  components  with  arbitrary  function^iUes  in 
calcite,  using  simple  wet  etching  technology.  The  best  results  were  obtained  for  an  anisotropic  grating,  splitting  an 
incident  beam  horizontally  into  two  beams  when  TE-polarized,  and  vertically  when  TM-polarized.  This  element  had 
an  average  total  efficiency  of  27%  and  a  polarization  contrast  ratio  of  more  than  110:1.  We  have  incorporated  Ais 
element  in  an  electrically  controlled  optical  beam  deflector  which  allows  us  to  switch  light  between  two  points 
without  any  mechanical  components. 
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Several  broad  classes  of  neural  networks  comprise  distributed,  nonlinear,  dynamical  systems 
in  which  large  numbers  of  relatively  simple  processing  elements  (neuron  units)  are  densely 
interconnected.  The  interconnections  are  often  configured  such  that  the  interconnection  weights  are 
adaptive  and  contain  the  learned  memories  and  behaviors  of  the  system.  Advanced  optical 
interconnection  techniques  are  being  developed  that  can  potentially  be  used  in  conjunction  with 
optoelectronic  neuron  units  to  implement  photonic  neural-like  computational  modules  (e.g..  Fig.  1) 
with  relatively  large  array  sizes  (10^  to  10^  neuron  units)  and  a  high  degree  of  connectivity  (fan-outs 
and  fan-ins  of  10^  to  10^,  with  10^  to  10^2  total  interconnections).  A  key  open  question  is  whether 
the  high  bandwidths  (potentially  100  MHz  or  more)  available  from  hybrid  optoelectronic  spatial 
light  modulators  (SLMs)  can  be  effectively  combined  with  such  high  density  volume  holographic 
optical  interconnections  (dynamically  recorded  in  photorefractive  materials)  to  provide  enhanced 
computational  throughput  capacity  as  well  as  complex  neural  network  simulation  capability.  A 
second  key  open  question  is  whether  advanced  electronic/photonic  packaging  technologies  can 
provide  capability  for  system-level  integration  of  highly  compact  multichip  modules  that  exhibit 
both  local  (multi-plane)  and  global  interconnections  (Fig.  2). 

Incorporation  of  appropriate  detection  elements,  control  circuitry,  modulators,  and 
diffractive  optical  elements  in  photonic  computational  modules  allows  for  broad  latitude  in  the 
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implementation  of  various  neural  network  models.  For  example,  such  modules  can  potentially  be 
configured  to  emulate  the  Dynamic  Link  Architecture  of  von  der  Malsburg  [1],  which  utilizes  elastic 
graph  matching  techniques  for  pattern  recognition  applications.  In  this  particular  architecture, 
explicit  use  is  made  of  neuron-unit  temporal  correlations  for  enhancing  synaptic  interconnections, 
while  individual  neuron-unit  activation  potentials  are  derived  from  temporally  correlated  inputs. 

In  this  presentation,  a  particular  photonic  neural  network  architecture  [2]  is  described  that  is 
based  on  double  angularly  multiplexed  incoherent/coherent  volume  holographic  recording  and 
readout,  provides  for  simultaneous  recording  of  the  weight  updates,  accommodates  a  high  degree  of 
signal  fan-out  and  fan-in  with  incoherent  (intensity-ba.sed)  summation,  minimizes  interconnection 
crosstalk  and  throughput  losses  [3],  and  allows  for  single-step  copying  of  the  entire  (learned)  volume 
holographic  interconnection  pattern.  The  current  development  status  of  key  photonic  components 
for  such  neural  network  implementations  will  be  addressed,  including  two-dimensional  arrays  of 
individually  coherent  but  mutually  incoherent  sources,  hybrid-integrated  (flip-chip  bonded)  spatial 
light  modulators  composed  of  silicon  drive  electronics  and  indium  gallium  arsenide  asymmetric 
cavity  Fabry-Perot  multiple  quantum  well  modulators  (Figs.  3  and  4)  [4,  5],  optical  disk  spatial  light 
modulators,  and  both  photorefractive  and  stratified  volume  holographic  optical  elements.  The 
integration  of  several  of  these  key  components  into  vertically-interconnected  multichip  modules 
anticipates  the  development  of  multilayer  retina-like  structures  that  can  extract  appropriate 
representations  for  subsequent  processing  in  a  single-layer  or  multilayer  neural  network  topology. 
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Introduction 

Real-time  classification  of  wide  instantaneous  bandwidth  temporal  signals  such  as  radar  range  profiles  or 
wideband  communications  signals  present  a  challenging  application  to  demonstrate  the  capabilities  of  adap¬ 
tive,  feature-based  optical  processing  systems.  In  this  paper  we  discuss  progress  on  a  nonlinearly  cascaded 
optical  neural  classifier  for  this  application.  The  system  uses  a  holographic  optical  learning  subsystem  to 
classify  time-shift  invariant  features  computed  from  input  wideband  signals.  The  optical  subsystems  are 
cascaded  using  an  optically  addressed  spatial  light  modulator  (OASLM)  implementing  a  saturating  square- 
law  nonlinearity.  Through  the  use  of  two-dimensional,  optically-computed  trilinear  input  feature  we  take 
full  advantage  of  the  high  space-bandwidth  and  high  throughput  processing  capabilities  specific  to  opti¬ 
cal  architectures  to  solve  an  otherwise  intractable  real-time  signal-processing  task  such  as  wideband  signal 
recognition.  In  what  follows  an  experimental  demonstration  of  this  cascaded  system  is  shown. 

Cascaded  System 


ampiitude 


modulator 
mirror  arrays 


phase 


error  signal 
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Arbitrary  Function 
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Computer 


Figure  1:  Schematic  layout  of  cascaded  acoustooptic  time-frequency  processor  coupled  through  a  time- 
integrating  spatial  light  modulator  into  a  volume  holographic  adaptive  classifier. 

In  the  experimental  system  we  cascade  an  AO  time-frequency  processor  (the  architecture  uses  a  four- 
transducer  surface  acoustic  wave  acoustooptic  device  [1])  implementing  the  triple-autocorrelation  transform: 

C{T3;,Ty)  —  j  S{i^)S(t'  -\-Ty)dt  (l) 

into  a  holographic  optical  learning  system  to  classify  individual  wideband  signals  as  shown  in  Figure  1.  This 
system  utilizes  the  wide-band  and  high-throughput  capabilities  of  acoustooptic  systems,  the  massive  paral¬ 
lelism  and  adaptive  capabilities  of  dynamic  volume  holograms,  and  sophisticated  neural  learning  algorithms 
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Figure  2:  Two  wideband  signals;  temporal,  spectral  and  triple  autocorrelation  representations 

to  solve  an  otherwise  intractable  real-time  signal  processing  task.  The  arbitrary  signal  generator  is  utilized 
to  control  and  train  the  adaptive  system  by  repetitively  cycling  through  a  large  library  of  wideband  signals 
from  the  data  base.  The  AO  triple  product  processor  is  used  to  calculate  an  invariant  feature  space  (the 
triple  autocorrelation)  on  each  wideband  signal  through  time  integration  on  the  high  speed  spatial  light 
modulator.  The  time  integrating  processor  produces  an  output  modulated  by  a  fringe  pattern  on  a  large 
bias.  The  coherent  readout  of  the  SLM  allows  this  pattern  to  be  Schlieren  imaged,  thereby  blocking  the 
bias  and  removing  the  fringe  structure.  This  2-D  pattern  can  then  be  used  to  readout  the  correlations  with 
the  templates  stored  in  the  volume  hologram,  giving  an  array  of  correlations  on  the  linear  detector  array, 
allowing  both  classification  and  training.  During  training,  this  classification  is  adapted  by  computing  the 
errors  from  the  desired  behavior  and  applying  the  error  vector  to  an  array  of  modulators  and  exposing  the 
hologram  appropriately.  After  training  has  converged  to  an  acceptable  level  of  performance  the  hologram 
may  need  to  be  fixed,  so  that  unwanted  erasure  is  prevented. 

In  the  holographic  storage  of  weights,  grating  amplitude  represents  the  weight  magnitude,  and  grating 
phase  represents  the  sign  of  the  weight.  Ideally,  grating  phase  should  be  either  0  or  tt;  it’s  deviation  from  these 
values  can  cause  difficulty  for  the  optical  learning  system.  Both  short-term,  and  long-term  phase  stability 
will  have  to  be  guaranteed,  which  may  require  active  stabilization.  Automatic  modification  of  the  stored 
holograms  can  be  accomplished  by  supplying  a  number  of  angularly  multiplexed  reference  beams  whose 
phase  and  amplitude  can  be  independently  controlled  by  a  circuit  generating  an  error-signal  proportional  to 
the  difference  between  the  detected  reference  beams  reconstructed  by  the  stored  holograms  and  a  training 
vector  made  available  to  the  circuit  during  a  training  cycle.  The  exposure  of  the  volume  hologram  to  the 
interference  pattern  between  the  bipolar  modulated  array  of  error  signals  and  the  Schlieren  imaged  and  bias 
removed  time-frequency  distribution  accomplishes  the  necessary  outer  product  updating  for  either  perceptron 
or  LMS  learning. 

These  processors  are  cascaded  using  a  two-dimensional  time-integrating  spatial  light  modulator.  Compu¬ 
tation  of  a  complete  time-integrated  triple  autocorrelation  only  requires  10-100//sec,  which  can  be  detected 
and  cascaded  into  the  holographic  pattern  recognition  system  using  a  high-speed  SLM  such  as  the  ferroelec¬ 
tric  liquid  crystal  devices  [2].  This  approach  avoids  the  30  frame  per  second  detector  bottleneck  suffered 
when  the  output  of  a  2-dimensional  optical  processor  is  categorized  electronically,  so  this  cascaded  processor 
can  achieve  a  throughput  that  is  well  beyond  the  capabilities  of  conventional  electronic  systems. 

Classification  and  discrimination  of  multiple  signals  of  unknown  time  of  arrival  can  not  be  accomplished 
in  a  single  layer  classifier  using  the  temporal  waveform  of  the  signal  at  an  arbitrary  shift  as  the  input 
feature,  because  of  limited  numbers  of  degrees  of  freedom.  Shift-invariant  classification  of  wideband  temporal 
signals  can  be  accomplished  using  simple  features  based  on  the  power  spectra  when  the  signal  spectra  offer 
separable  features.  In  figure  2  we  see  three  representations  of  two  different  wideband  temporal  signals: 
digital  modulation  sequences  for  a  binary  phase-shift  keyed  (BPSK)  communications  channel.  In  this  case 
the  spectral  domain  features  are  nearly  identical  so  can  not  be  used  to  discriminate  these  signals,  whereas 
in  the  triple-autocorrelation  representation  the  signals  are  orthonormal,  hence  trivially  separable  using  a 
single-layer  classifier.  The  triple  autocorrelation  representation  allows  time-shift  invariant  classification  as 
well,  since  for  any  shift  of  the  input  signal  the  2-dimensional  feature  is  invariant  and  centered  in  the  output 
plane. 

Experimental  Results 

The  simulated  triple-autocorrelation  of  a  digital  BPSK  modulation  sequence  is  shown  in  figure  3a  for 
reference;  figure  3b  shows  the  raw  output  of  the  optical  triple- autocorrelator  where  the  salient  feature  is  on 
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Figure  3:  a.  Digitally  computed  two-d  auto  correlator  output;  b.  ”raw”  optical  two-d  auto  correlator  output; 
c.  direct  imaging  of  LCLV  output;  d.  Schleiren  imaged  output  with  bias  removed. 

Linescan  ol  reconstructed  references  Linescan  of  reconstructed  references  Linescan  of  reconstructed  references 


^  ^  . 

Figure  4:  Classification  of  wideband  signals:  a.  sequence  1;  b.  sequence  2;  c.  sequence  3. 


a  spatial  carrier;  figure  3c  shows  the  image  as  it  passes  through  the  LCLV  (note  the  spatial  carrier  remains); 
and  figure  3d  shows  the  Schlieren  image  of  the  output  (or  read)  side  of  the  LCLV.  The  optically  computed 
image  is  a  good  match  with  the  computer  simulation  but  is  computed  in  real-time.  As  evident  from  figure  3, 
Schlieren  imaging  removes  the  high  optical  bias,  the  spatial  carrier  on  the  computed  features,  and  much  of 
the  fixed-pattern  noise  in  the  LCLV  output. 

Figure  4  shows  an  experimental  demonstration  of  the  cascaded  classifier  trained  using  three  digital  modu¬ 
lation  sequences  as  shown  above.  Scheduled  recording  [3]  of  the  complex  holographically-stored  classification 
filters  permits  equal  diffraction  efficiencies.  When  each  signal  is  presented  to  the  feature  extractor,  the 
trained  holographic  classifier  reconstructs  the  plane  wave  reference  at  the  appropriate  angle  identifying  the 
class;  the  training  signals  were  contaminated  by  noise  (20  dB  SNR)  and  used  to  test  the  classification  per¬ 
formance.  Relatively  low  fixed-pattern  noise  provides  overhead  for  a  good  discrimination  threshold;  the 
successfully  classified  signal  provides  approximately  10  dB  more  reconstructed  reference  than  fixed-pattern 
noise  or  crosstalk. 


Conclusions 

We  have  cascaded  a  high-throughput  acousto-optic  triple-product  feature  extractor  with  an  optical  linear 
machine  implemented  using  a  bank  of  holographically-stored  complex  filters  in  a  photorefr active  crystal.  This 
architecture  demonstrates  the  compatibility  of  optical  signal  processing  systems  for  feature  extraction  with 
adaptive  holographic  learning  systems  and  enables  the  application  of  neural  learning  algorithms  to  challenging 
signal  classification  problems  such  as  wide  instantaneous  bandwidth  temporal  signal  identification. 
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1  Introduction 

Pap-smears  are  slides  of  cellular  material  used  to  screen  for  cervical  cancer.  Currently  pap-smears  are 
examined  manually,  a  repetitive  and  tedious  task  which  leads  to  about  18  %  false  negative  rate  (%  of 
abnormal  slides  going  undetected)  [1]  .  Automated  pap-smear  examination  is  desirable  as  both  a  quality 
control  mechanism  for  detecting  abnormal  slides  missed  by  human  inspection  and  as  a  primary  diagnostic 
cytology  screen.  Automated  screening  is  challenging  since  it  is  a  typical  example  of  the  ^^needle-in-a-haystack” 
problem,  where  the  features  of  interest  are  hidden  in  a  vast  search  area.  In  a  pap-smear  one  in  10,000  cells 
screened  may  be  abnormal.  Detecting  this  cell  requires  high  computation  power  and  throughput.  Each  slide 
is  2.5  cm  X  5.0  cm  on  a  side.  Features  of  interest  (the  cell  nuclei)  are  on  average  10  microns  in  diameter 
and  sampling  at  0.8  //m/pixel,  (equivalent  to  screening  the  slide  with  a  20x  objective)  therefore  requires  at 
least  37,000  images  of  256x256  pixels  to  be  processed  for  each  slide  screened.  In  this  paper  we  present  an 
optoelectronic  implementation  of  the  morphological  hit-or-miss  transform  which  can  scan  each  pap-smear 
slide  and  detect  the  regions  of  interest  (ROI)  in  that  slide  in  under  five  minutes. 

Figure  1(a)  is  an  image  from  a  pap-smear  slide  depicting  normal  and  abnormal  squamous  cells  from 
the  cervix,  white  blood  cells  and  background  mucus.  The  ROI  (abnormal  areas)  can  be  detected  by  exam¬ 
ining  the  shape,  size  and  optical  density  of  the  cell  nucleus  and  the  nucleus-to-cytoplasm  area  ratio.  Our 
initial  attempts  at  detecting  the  ROI  using  integrated  optical  density  (lOD)  and  template  matching  were 
found  to  be  unsuitable.  The  lOD  method  is  simple  and  fast,  but  gave  rise  to  many  false  positives.  Template 
matching  using  a  bank  of  templates  requires  extensive  post-processing  while  failing  to  discriminate  between 
normal  and  abnormal  areas  in  the  presence  of  clutter.  The  morphological  hit-or-miss  transform  (HoM)  was 
discriminating  enough  and  also  fast  as  it  can  be  implemented  using  a  thresholding  optical  correlator.  A 
computer  simulation  of  the  HoM  transform  implemented  as  a  thresholding  correlator  detected  95  %  of  the 
abnormal  regions  correctly  in  187  images.  Figure  1(b)  shows  the  areas  of  the  image  detected  as  abnormal 
by  the  computer  simulation  superimposed  on  the  original  image.  In  the  simulation  the  gray  scale  image  is 
thresholded  to  pick  up  all  the  pixels  below  128  graylevels  bybyand  the  HoM  transform  is  used  to  detect  any 
region  in  the  thresholded  image  which  are  roughly  circular  in  shape,  with  a  diameter  ranging  from  12/im  to 
20/im. 


2  Morphological  hit-or-miss  transform 

The  HoM  transform  is  a  three  step  process.  The  first  step,  called  the  “Azf’,  detects  all  regions  in  the  image 
with  a  diameter  greater  than  or  equal  to  12/xm.  This  is  done  by  performing  a  morphological  erosion  on  the 
image  with  a  hit  structuring  element  (SE)  of  12//m  diameter.  The  second  step,  called  the  “miss”,  detects  all 
the  regions  with  a  diameter  less  than  or  equal  to  20^m.  This  is  done  by  eroding  the  complement  of  the  input 
image  with  a  miss  SE  which  is  an  annulus  with  an  inner  diameter  of  20^m.  The  final  step  is  the  Boolean 
AND  of  the  hit  and  the  miss  to  detects  all  the  regions  ranging  from  I2fj.m  to  20^m.  The  HoM  transform  is 
symbolically  written  as  [2] 


A  ©  (B,  M)=:::(AeB)n  (A"  ©  M) 


(1) 
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(a)  Pap-smear  image  (b)  ROI  detection  on  the  pap-smear  image 

Figure  1:  (a)  Example  of  an  image  from  a  pap-smear  slide  from  our  annotated  database.  Image  was  acquired 
from  the  microscope  using  a  20x  objective  and  a  508x480  pixels  CCD  camera  for  a  sampling  resolution  of 
0.8  //m/pixel,  (b)  Results  from  computer  simulation  of  the  Hit-or-Miss  transform  superimposed  upon  the 
original  image.  Note  that  it  has  picked  up  all  the  large  dark  nuclei  while  ignoring  all  other  areas  on  the 
image 


where  A  is  a  binary  image,  H  and  M  are  the  hit  and  miss  SE,  0  denotes  erosion,  fl  denotes  Boolean  AND 
and  denotes  the  complement  of  A.  Erosion  can  be  optically  implemented  as  a  correlation  of  the  image 
with  the  SE,  followed  by  a  threshold.  This  can  be  symbolically  written  a,s  XOB  =  Te{XicB)  where  'k  stands 
for  correlation,  Te{.}  is  a  thresholding  function  which  takes  the  value  1  if  its  argument  is  greater  than  e  and 
0  otherwise.  To  obtain  erosion  the  threshold  level  e  is  set  to  N  where  N  is  the  cardinality  of  B.  Hence  the 
HoM  can  be  written  as 

A  0  {H,  M)  =  Th{A  kH)x  Tm{A^  ^  M)  (2) 

The  HoM  can  be  optically  implemented  by  first  correlating  the  image  A  with  the  hit  SE,  i7,  and  threshold¬ 
ing  the  result.  Then  A^  is  correlated  with  the  miss  SE,  M,  and  thresholded.  Finally,  the  two  erosions  are 
multiplied  to  yield  the  HoM  transform  of  A. 


3  Simulation  Results 

The  classical  HoM  is  very  sensitive  to  noise  and  size/shape  perturbations  and  will  only  detect  a  feature  if 
the  feature  exactly  matches  the  shape  of  the  hit  and  miss  structuring  elements.  However  in  our  application 
the  feature  of  interest,  namely  the  nucleus,  varies  slightly  in  shape  and  size.  This  mismatch  between  the 
feature  and  the  SE  results  in  reduced  overlap  between  the  SE  and  the  feature,  and  hence  the  mismatch  can 
be  accounted  for  by  setting  lower  thresholds  after  the  hit  or  miss  correlations.  Let  the  hit  kernel  be  a  disc 
with  a  diameter  of  12fjim  and  the  miss  kernel  be  an  annulus  with  an  inner  diameter  of  20//m  and  an  outer 
diameter  of  22fim.  Sampling  these  kernels  at  O.Sfim/pixel  yields  a  cardinality  of  177  and  103  for  the  hit 
SE,  77,  and  miss  SE  ,M,  respectively.  Setting  the  thresholds  h  and  m  at  177  and  103  in  2  would  detect 
only  those  features  which  matched  the  circular  kernels.  An  oval  feature  which  is  mismatched  to  the  circular 
kernels  can  be  detected  by  these  kernels  only  if  the  thresholds  are  set  lower  than  177  and  103.  Figure  2 
shows  the  performance  of  the  HoM  transform  as  a  receiver  operator  curve.  The  hit  threshold,  h  was  set  at 
89  to  allow  up  to  50%  mismatch  between  the  the  feature  and  the  hit  SE.  The  curve  was  then  obtained  by 
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varying  m  from  103  to  52  which  corresponds  to  a  0%  to  50%  mismatch  respectively  between  the  the  feature 
and  the  miss  SE.  Note  that  the  top-right- corner  of  the  curve  shows  that  the  HoM  detects  95  %  of  the  suspect 
regions  correctly  while  detecting  only  4.5  %  of  the  normal  regions  as  suspect.  This  point  corresponds  to  the 
50  %  mismatch  in  both  the  hit  and  miss  SE. 


Figure  2:  Performance  of  the  hit- 
or-miss  feature  detector  as  a  func¬ 
tion  of  the  mismatch  allowed  be¬ 
tween  the  feature  and  the  miss  SE. 


Figure  3:  Optoelectronic  processor  to  perform  the  hit-or-miss  op¬ 
eration  to  detect  abnormal  areas  in  a  pap-smear  slide. 


4  Optoelectronic  implementation 

Figure  3  shows  the  optoelectronic  implementation  of  the  HoM  transform  using  a  4f  Vander-Lugt  correlator. 
The  input  image  to  the  correlator  is  acquired  directly  from  the  pap-smear  slide  PS  using  a  microscope 
objective  L2  with  /#  =  1.25  and  written  onto  an  OASLM  [3]  which  thresholds  the  image  to  obtain  a  binary 
input  image.  Wi  is  a  ferroelectric  liquid  crystal  (FLC)  switchable  half  waveplate  used  to  obtain  a  contrast 
reversed  image  for  the  miss  operation,  PBS  is  a  polarizing  beamsplitter,  L3  is  a  /#  =  6.4  Fourier  transform 
lens  ,  EASLM  is  the  filter  plane  electrically  addressed  spatial  light  modulator  [4],  W2  a  passive  half  wave 
plate  used  to  align  the  input  linear  polarized  light  along  the  bisector  of  the  two  switched  states  of  the  FLC 
and  the  output  smart  detector  array  is  a  custom  designed  VLSI  device  used  to  threshold  and  AND  the  hit 
and  miss  [5].  Initially  the  image  on  the  OASLM  is  read  with  a  collimated  vertically  polarized  laser  beam 
with  Wi  aligned  along  the  vertical  axis.  This  image  is  transmitted  by  the  PBS,  Fourier  transformed  by  L3 
and  multiplied  by  the  hit  filter  in  binary  phase  only  (BPO)  representation  on  the  EASLM.  The  reflected  light 
from  the  EASLM  is  Fourier  transformed  by  L3  and  the  vertically  polarized  light  is  reflected  to  the  smart 
detector  array  which  thresholds  the  image  to  obtain  the  hit  erosion.  To  obtain  the  image  complement  the 
optic  axis  of  Wi  is  electrically  rotated  by  45^  The  corresponding  miss  BPO  filter  is  written  to  EASLM  to 
obtain  the  miss  correlation  at  the  detector  plane.  The  smart  detector  array  thresholds  the  result  to  obtain 
the  miss  erosion  and  then  ANDs  the  two  thresholded  images  to  get  the  HoM  output. 

Experimental  results  from  this  HoM  processor  detecting  ROI  on  pap-smears  will  be  presented  at  the 
meeting. 
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In  recent  years  there  has  been  a  resurgence  of  interest  in  holograpWc  memories.  Most  of 
the  recent  experiments  in  holographic  storage  have  been  in  LiNbOs,  in  wliich  up  to  10,000 
holograms  have  been  stored  in  one  location  [1],  or  the  DuPont  photopolymer  in  which  1,000 
holograms  were  stored  [2].  A  technique  called  peristropliic  multiplexing  was  combined  with 
conventional  angle  multiplexing  to  store  the  1,(K)0  holograms  in  the  polymer  which  has  a  thickness 
of  only  100  microns.  Most  of  the  development  of  holograpliic  memories  is  aimed  at  digital 
computer  storage.  In  this  paper  we  focus  instead  on  the  application  of  holographic  memories  to 
image  processing.  Specifically  we  use  the  peristrophic  system  as  an  optical  database  to  store 
images  to  navigate  a  small  car  autonomously  along  specified  paths.  This  experiment  suggests  that 
the  two  best  features  of  holographic  storage,  capacity  and  parallel  access,  can  be  put  to  good  use  in 
real  time  macliine  vision  applications. 

The  peristrophic  memory  system  is  shown  in  Figure  la.  It  is  very  similar  to  a  conventional 
angle  multiplexed  system,  with  the  signal  beam  normal  to  the  surface  of  the  medium  and  a  plane 
wave  reference  beam  incident  at  an  angle.  In  a  peristrophic  memory,  holograms  are  multiplexed  by 
rotating  the  medium  ai'ound  the  surface  normal  (which  is  also  the  direction  of  the  signal  beam  in 
this  case).  The  film  rotation  causes  the  reconstruction  of  a  recorded  hologram  to  move  away  from 
the  output  detector  aivay,  which  makes  it  possible  to  record  a  new  hologram  on  the  rotated  film. 
Stored  data  is  retrieved  by  illuminating  the  hologram  with  the  reference  plane  wave  and  rotating  the 
film  to  the  appropriate  peristrophic  position.  Typically,  100  or  more  holograms  can  be 
peristrophically  multiplexed  independent  of  the  hologram  thickness.  The  same  system  can  also  be 
configured  as  an  airay  of  optical  coirelalors  (Figure  lb).  In  this  case  the  hologram  is  illuminated 
with  the  signal  beam  and  a  "ring"  of  conelations  is  produced  suiTounding  the  image  of  the  input 
SLM.  If  angle  multiplexing  is  combined  with  peristrophic  multiplexing,  then  multiple  concentric 
rings  of  conelations  form  at  the  output.  Previously,  we  have  demonsfrated  up  to  1,000  stored 
images  and  hence  1,000  conelations.  The  coieelations  can  be  detected  in  parallel  by  multiple 
detector  arrays.  Alternatively,  a  single  detector  airay  can  be  used  at  one  conelation  position.  In 
this  case,  the  hologram  is  rotated  and  the  memory  is  searched  serially.  This  serial  mode  is  used  in 
the  experiment  we  describe  in  this  paper  since  this  mode  is  well  suited  for  the  car  navigation 
problem. 

The  experiment  was  done  with  a  small  car  that  we  put  together  ourselves.  Tlie  car  has 
three  wheels  and  it  canles  a  CCD  camera,  a  video  transmitter  (to  relay  tlie  video  to  the  optical 
table),  a  remote-control  receiver  (to  receive  the  control  signals  for  hirning  and  speed),  two  drive 
motors  (one  for  each  of  the  two  front  wheels),  and  two  lead-acid  batteries.  First,  the  car  is  moved 
manually  along  the  desired  course.  The  images  that  the  car-mounted  camera  sees  are  sampled 
periodically  and  recorded  in  DuPont’s  photopolymer  through  peristropliic  multiplexing.  The 
rotation  between  holograms  is  small  enough  so  that  thiee  conelation  peaks  can  fit  within  the 
detector  aieay  placed  at  the  conelation  plane.  The  bottom,  middle,  and  top  correlation  peaks 
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represent  the  previous  way-point,  the  cun'ent  position  of  the  car,  and  the  next  way-point, 
respectively.  Then,  after  the  entire  path  has  been  so  mapped,  the  cai-  is  returned  to  tlie  original 
position  and  the  photopolymer  is  returned  to  the  original  angle.  The  video  transmitted  back  from 
the  car  is  presented  on  the  SLM  and  is  conelated  with  the  stored  holograms.  What  the  car  sees 
now  is  what  is  stored  as  tlie  first  hologram,  so  a  strong  coirelation  peak  appears  in  the  middle  of 
the  detector  aiTay.  A  weaker  peak  representing  the  next  way-point  along  the  path  also  appears 
above  the  middle  correlation  peak.  The  car  is  then  commanded  to  move  forward.  A  personal 
computer  monitors  the  digitized  coirelation  peaks  as  seen  by  the  CCD  at  the  output  of  the 
correlator.  Tlie  computer  extracts  steering  information  from  the  lateral  position  of  the  middle 
con-elation  peak  and  hansmits  it  to  the  car.  If  the  car  is  off-course  to  the  left,  the  middle 
correlation  peak  will  be  to  the  right  of  center  and  the  car  is  instnicted  to  turn  right.  Conversely,  if 
the  car  is  off-course  to  the  right,  the  middle  coirelation  peak  will  be  to  the  left  of  the  center  and  the 
car  is  insti-ucted  to  turn  left.  Furthermore,  when  the  intensity  of  the  top  coirelation  peak  becomes 
stronger  than  the  middle  coirelation  peak,  the  car  is  assumed  to  have  reached  the  next  way-point 
along  tlie  path  and  the  computer  rotates  the  hologram.  This  causes  the  top  peak  to  now  appear  at 
tlie  middle.  In  tliis  way,  as  the  car  proceeds,  it  is  steered  tlirough  the  series  of  way-points  and  it 
stays  on  the  desired  course.  Tliis  mode  of  navigation  is  the  "follow"  navigation  mode.  The  system 
automatically  switches  to  other  navigation  modes  (controlled  by  software  in  the  computer)  to  allow 
the  cai'  to  execute  shaip  turns,  search  for  a  familial’  path  when  it  is  lost,  or  switch  between  two 
paths.  In  tliis  way  we  were  able  to  program  the  optical  memoi-y  and  the  PC  to  guide  the  car  to 
complete  various  complex  trips. 

For  example,  an  experiment  was  setup  to  navigate  the  car  from  one  lab  to  another.  The 
labs  are  about  15  meters  apart  joined  by  a  common  hallway.  Way-points  were  recorded  at  about 
30  cm  intervals  down  the  hallway.  A  total  of  54  holograms  were  recorded  to  describe  the  entire 
path.  Experimentally,  the  car  was  able  to  reproduce  the  desired  path  within  a  few  inches. 
Furthermore,  the  system  was  very  tolerant  ol'  noise  such  as  placing  new  objects  in  the  hallway  and 
our  attempts  to  push  the  car  off-course.  Figure  2  shows  a  composite  video  recorded  during  the 
experiment.  The  coiTelation  plane  in  the  left  top  corner,  what  the  car  sees  in  the  middle,  and  what 
tlie  car  expects  to  see  as  the  next  way-point  in  the  right  top  corner. 

In  conclusion,  we  have  demonstrated  a  system  that  uses  peristrophically-multiplexed 
holograms  to  navigate  a  car  in  real  lime  through  our  laboratory.  It  should  be  possible  to  build  a 
simple  system  that  navigates  a  car  through  the  entire  Caltech  campus  with  the  storage  capacity  of  a 
single  holographic  3-D  disk. 
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Figure  lb  :  Holograpliic  optical  conelator  using  Peristrophic  multiplexing. 


Figure  2  :  Car  navigation  experiment. 
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This  work  passes  an  important  milestone 
in  the  history  of  optoelectronic  and  perhaps 
even  electronic  technology  in  general,  the 
demonstration  of  a  VLSI-scalable  electronic 
technology  integrated  with  a  high-speed, 
dense  optoelectronic  technology.  With 
optoelectronic  integration  to  silicon  VLSI, 
one  hopes  to  augment  state-of-the-art  silicon 
density  and  processing  power  with  fast,  high- 
density  optical  VO.  This  is  the  first 
integration  of  dense  silicon  circuitry 
(120,000  transistors/cm^)  and  dense 
optoelectronics  (28,000  devices/cm^)  on  a 
chip,  all  operating  at  a  clock  of  250 
Mbits/sec.  Of  course,  we  only  produced  a 
small  operating  area  (480x480  pm),  but  the 
indications  of  our  research  are  that  chips  on 
the  order  of  1  cm^  are  reasonable.  This 
development  potentially  has  the  most  impact 
in  the  field  of  photonics  since  the 
semiconductor  laser  and  optical  fiber 
transformed  the  telecommunications 
industry.  The  reason  for  this  is  first  that  it 
ameliorates  the  development  of  higher 
throughput  (~  terabit/sec)  switches,  ^ 
impacting  the  multi-ten  billion  dollar/year 
switching  equipment  market  and  allowing 
telephony  traffic  at  projected  levels  in  the 
next  century.  Second,  it  alleviates  the 
integrated  circuit  I/O  communication 
bottleneck,  thus  potentially  affecting  the 
entire  computing  industry. 

Since  the  goal  of  optoelectronic 
integration  to  silicon  circuits  is  optical  I/O, 
modulators  offer  an  inherent  advantage  over 
other  devices;  they  perform  both  functions. 
Silicon  detectors  have  limited  performance, 
especially  at  the  ~  Gbit/sec  speeds  hoped  for 
in  the  near  future.  We  have  attempted 
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growth  of  III-V  modulators  on  silicon  IC's, 
but  this  is  hampered  by  the  necessity  of 
metalizing  the  chip  after  the  growth  cycle.2 
This  leaves  two  possibilities  for  attachment 
of  the  modulators;  epitaxial  lift-off,  where 
thin-film  device  layers  are  transferred  to  the 
silicon,^  or  flip-chip  bonding.^  Epitaxial  lift¬ 
off,  while  interesting,  was  not  viewed  as 
manufacturable  as  flip-chip  bonding  in  the 
near  term.  However,  flip-chip  bonding 
suffered  from  the  fact  that  previous  to  this 
work,  substrate-transparent  operation  was 
required.  We  have  found  that  850  nm 
GaAs/AlGaAs  modulators  offer  much 
superior  performance  compared  to  longer 


Fig.  1;  Three  step  hybridization  process;  (1) 
Fabrication,  aligning,  and  bonding  of  modulator 
chip  on  silicon  chip.  (2)  Flowing  epoxy  between 
chips,  which  is  allowed  to  harden.  (3)  Removal 
of  GaAs  substrate  using  jet  etcher,  and 
deposition  of  AR  coating.  The  epoxy  can  be 
removed  after  substrate  removal  as  desired. 
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wavelength  devices. The  solution  to  this 
logical  puzzle  is  straightforward:  remove  the 
substrate  after  flip-chip  bonding.^ 

The  fabrication  procedure  is  outlined  in 
Fig.  1.  Modulators  are  produced  in  the 
GaAs  chip  whose  n  and  p  contacts  are 
coplanar.  In  [7]  this  was  accomplished  by 
depositing  thick  gold  over  the  bottom 
contact.  Here  we  employ  implantation.^ 
Lead-tin  is  deposited  on  these  for  a  solder 
using  photolithography.  The  silicon  chips 
are  obtained  from  the  MOSIS  foundry 
facility.  Mating  aluminum  pads  from  the 
modulators  are  designed  on  those  chips,  and 
a  Ti/Pt/Au  layer  is  deposited  on  them  (in  our 
lab)  to  provide  a  solder-wettable  surface, 
then  lead-tin  deposited  on  them.  A  precision 
bonder  made  by  Research  Devices  in 
Piscataway,  NJ  was  employed  to  bond  the 
chips  together.  Two  micron  accuracy  is 
routine. 

A  key  feature  of  the  technique  for  flip- 
chip  bonding-  then  substrate  removal  is  the 
etching  of  outer  mesas  around  the  devices 
into  the  substrate.  Then,  when  the  substrate 
is  removed  by  applying  a  chemical  stream  to 
it  (that  stops  on  the  AlGaAs  stop  etch  layer), 
isolated  devices  will  be  left.  This  is  desirable 
since  if  the  stop  etch  layer  was  left  extending 
over  the  whole  chip,  slight  warpages  would 
cause  it  to  break,  possibly  damaging  the 
modulators.  The  substrate  etchant,  100:1 
H202:NH40H,  does  not  attack  Si  or  A1 
appreciably.  However,  it  would  attack  the 
GaAs  regions  of  the  modulators.  To  protect 
the  front  faces  of  the  chips,  epoxy  was 
flowed  between  the  chips  as  shown  in  the 
middle  pictorial  of  Fig.  1 .  This  was  done  by 
depositing  a  bead  of  the  epoxy  on  the  side  of 
the  GaAs  substrate  using  a  optical  fiber 
manipulated  by  a  precision  stage.  The  epoxy 
then  wicked  neatly  between  the  chips.  It  is 
possible  to  meter  the  amount  of  epoxy  so 
that  it  just  fills  the  volume  between  chips. 
We  have  developed  a  procedure  to  remove 


the  epoxy  after  substrate  removal  without 
damaging  the  devices. 

Substrate  removal  offers  other 
advantages,  besides  operation  at  850  nm. 
These  include  batch  fabrication  (ability  to 
dice  a  large  chip  into  smaller  chips  after 
fabrication),  reduced  thermal-mechanical 
stress  on  the  solder  bonds,  and  elimination  of 
optical  crosstalk  due  to  in-substrate 
reflections.  Also,  having  the  resulting 
structure  be  like  a  single  chip  offers  simple 
but  perhaps  important  conveniences:  the 
ability  to  probe  and  visually  inspect  the  chip, 
and  easier  access  of  wire-bonding  tools. 
Finally,  and  perhaps  most  important, 
substrate  removal  may  allow  further  bonding 
of  multiple  arrays  of  optoelectronic  devices, 
or  possibly  lenslet  arrays,  to  the  chip. 

We  have  fabricated  CMOS  chips  with  1.2 
pm  linerules  that  contain  switching  nodes 
(Fig.  2).  In  each  pixel  there  are  two  input 
detectors,  two  modulators  (each  with  15x15 
pm  junctions),  and  18  CMOS  transistors. 
Each  pixel  performs  a  2x1  switching 
operation,  as  in  [1].  Fig.  3  shows  the  output 
of  one  of  the  pixels  operating  at  250 
Mbits/sec.  All  16  nodes  had  this 
performance. 


We  have  also  produced  device  arrays  on 
bare  silicon  as  large  as  32x32.  We  made 


Fig.  2:  Photo  of  our  4x4  GaAs  hybrid-on-Si 
array. 
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chains  of  devices  with  only  n-contacts  to  test 
bond  5deld.  For  these  we  obtained  99.94  % 
bond  yield  for  15x15  micron  solder  pads. 
We  also  made  LED  test  arrays,  but  with  only 
95  %  device  yield.  We  have  attributed  this 
to  an  observable  intermetallic  reaction  that 
occurs  between  the  solder  and  the  p-type 
metal  during  solder  reflow  (melting).  We  are 
currently  working  to  increase  device  yield  to 
equal  that  of  the  solder  bonds  themselves. 

In  conclusion,  we  have  demonstrated  a 
practical  method  of  integrating  GaAs 
modulators  onto  silicon  circuits  via  flip-chip 
bonding,  followed  by  substrate  removal.  We 
have  produced  a  4x4  array  of  smart  pixels  all 
operating  at  250  Mbits/sec.  In  larger  arrays, 
we  obtain  95  %  device  yield,  and  feel  that 
this  can  improve  to  99.9  %. 
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Multi-material  monolithic  integration  can  be  achieved  through  the  integration  of  thin  film 
compound  semiconductor  devices  with  silicon  circuitry.  This  type  of  integration  enables  the 
system  designer  to  use  the  optimal  material  to  achieve  the  desired  cost  and  performance 
requirements  of  the  system.  Silicon,  the  acknowledged  leading  technology  for  low  cost 
electronics,  is  a  particularly  attractive  host  substrate  on  which  to  integrate  thin  film  devices  with 
optoelectronic  capabilities.  In  this  paper,  we  report  the  integration  of  InP-based  emitters  and 
detectors  with  a  single  silicon  circuit  which  contains  an  emitter  drive  circuit  and  a  detector 
amplifier  circuit.  These  optoelectronic  integrated  circuits  (OEICs),  which  operate  at  a  wavelength 
of  1.3  pm,  are  useful  as  receivers  and  transmitters  for  optoelectronic  interconnection  schemes 
which  include  three  dimensional,  massively  parallel  computational  systems  using  through-silicon 
wafer  optoelectronic  interconnects,  grating/waveguide  optical  interconnection  layers  for  multi¬ 
chip  modules,  and  optical  fiber. 

The  InP-based  compound  semiconductor  devices  were  grown  lattice  matched  onto  a  InP 
substrate,  and  were  subsequently  separated  from  the  growth  substrate  using  selective  etching, 
known  as  epitaxial  lift-off.  The  emitter  was  a  homojunction  InGaAsP  (p=3  X  10*’  cm’^)  / 
InGaAsP  (n=2  X  10**  cm’^)  /  InP  (substrate),  and  the  detector  was  a  double  heterostructure  InP 
(p=3  X  10*’ cm'^)  /  InGaAsP  (process  undoped)  /  InP  (n=10**  cm*’)  /  InP  (substrate).  Prior  to  the 
separation  of  the  devices  from  the  growth  substrate,  an  AuZn/Au  (50/200nm)  p-type  ohmic 
contact  was  vacuum  deposited  onto  each  of  the  structures,  which  was  then  patterned  to  define 
250  pm  X  250  pm  mesas  which  also  served  as  a  mesa  etch  mask  to  define  devices.  These  mesa- 
etched  devices  were  then  separated  from  the  growth  substrate  using  selective  etches  to  dissolve 
the  substrate  [1,  2],  and  were  then  bonded  to  a  transparent  Mylar  transfer  diaphragm  [3]. 

The  emitter  and  driver  circuits  were  located  on  a  single  MOSIS  TinyChip  in  2  pm  CMOS. 
The  driver  circuit  for  the  light  emitting  diode  is  a  three  stage  transimpedance  driver,  shown  in 
Figure  1 .  Each  stage  consists  of  an  analog  inverter  mirrored  to  the  input  of  the  next  stage,  with 
the  mirror  portion  of  the  last  stage  replaced  by  the  LED.  The  detector  amplifier  is  a  single  diode 
connected  n-type  device.  Overglass  cuts  to  the  emitter  driver  and  detector  amplifier  inputs  (two 
per  circuit)  were  included  in  the  MOSIS  design  file. 

To  integrate  the  thin  film  detector  onto  the  silicon  amplifier  and  the  thin  film  emitter  onto 
the  silicon  driver,  Ti/Au  pads  were  deposited  onto  the  CMOS  circuits  pads  to  realize  electrical 
connection  between  the  thin  film  optoelectronic  devices  and  the  circuits.  The  detector  on  the 
Mylar  diaphragm  was  then  aligned  and  bonded  to  the  pad  connected  to  the  amplifier.  In  the  same 
fashion,  the  emitter  on  the  Mylar  diaphragm  was  aligned  and  bonded  to  the  pad  connected  to  the 
driver  circuit.  The  circuit  was  then  planarized  using  spin  coated  polyimide.  An  A1  mask  was 
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vacuum  deposited  and  windows  were  opened  in  the  polyimide  using  a  reactive  ion  etch  of 
CHF3/O2.  The  n-type  contact,  AuGe/Ni/Au,  was  then  evaporated  onto  the  top  of  the  devices, 
connecting,  respectively,  the  detector  to  the  amplifier  and  the  emitter  to  the  driver.  Optical 
windows  were  then  opened  in  the  n-type  contact  top  metallization. 

The  silicon  driver  circuit  with  integrated  emitter  and  the  silicon  amplifier  circuit  with 
integrated  detector  were  then  individually  tested.  To  test  the  detector,  the  output  from  a  Hewlett 
Packard  Lightwave  Multimeter  Emitter  operating  at  a  1.3  pm  wavelength,  output  power  of  780 
pW,  and  a  square  pulse  rate  of  1  kHz  was  incident  on  the  integrated  detector.  The  ^plifier  was 
biased  at  1.8  V.  Figure  2  shows  the  output  of  the  silicon  amplifier  circuit  when  this  input  light 
illuminated  the  integrated  thin  film  detector,  producing  an  1.1  V  peak  to  peak  square  wave 
(displayed  as  1 1  V  through  a  lOX  scope  magnifier),  demonstrating  excellent  signal  to  noise  ratio. 
This  is  consistent  with  detector  responsivities  of  0.5  AAV  measured  from  similar  samples  coupled 
with  the  variable  resistance  of  the  amplifier,  which  is  in  the  MQ  range. 

To  test  the  emitter,  a  square  wave  electrical  signal,  shown  in  the  top  of  Figure  3,  was 
input  to  the  emitter  driver  at  Vi„,  shown  in  Figure  1,  while  the  power  supply  bi^,  V<id ,  was  fixed. 
This  resulted  in  the  output  signal  shown  in  the  lower  trace  in  Figure  3.  This  trace  shows  the 
output  of  the  integrated  light  emitting  diode  as  detected  through  a  multi-mode  fiber  into  a  Hewlett 
Packard  Lightwave  Multimeter  Power  Sensor  operating  at  a  1.3  pm  wavelen^h,  cl^ly 
indicating  that  the  integrated  InP-based  light  emitting  diode  is  being  driven  by  the  silicon  emitter 
driver,  which  is  controlled  through  the  external  electrical  input  to  the  silicon  circuit. 


Thus,  integration  of  both  a  thin  film  InGaAsP  homojunction  emitter  and  a  thin  film 
InP/InGaAsP/InP  double  heterostructure  detector  with  a  single  foundry  silicon  circuit  containing 
both  a  detector  amplifier  and  an  emitter  driver  have  been  demonstrated.  This  type  of  integration 
demonstrates  that  multi-material  thin  film  devices  can  be  integrated  onto  the  same  silicon  circuit, 
thus  providing  to  the  designer  an  expanded,  multi-functional  design  space  for  optimized  systems. 


Vdd 
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Figure  1 .  Schematic  of  the  silicon  transmitter  circuit. 
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A  promising  route  for  the  construction  of  smart  pixels  is  to  flip-chip  bond  III-V 
semiconductor  devices  as  detectors^^^  and  modulators  onto  silicon  circuitry.  InGaAs  Quantum  well 
devices  grown  on  GaAs  substrates  and  operating  at  around  1  pm  provide  a  good  option  for  the 
III-V  devices  since  there  are  high  power  lasers  available  including  Nd:YLF  at  1047nm  and 
substrate  removal  is  not  necessary.  Silicon  CMOS  is  attractive  for  the  electronics  since  it  is  a 
mature  technology,  allows  very  high  packing  density  and  has  the  low  power  consumption 
necessary  for  systems  based  on  many  channels  each  with  a  high  degree  of  smartness.  In  our  work 
we  have  so  far  used  1  |xm  double  metal  n-well  CMOS  and  future  devices  will  be  fabricated  using 
0.7/0.8  urn  CMOS.  The  CMOS  process  limits  the  available  voltage  swing  for  driving  the  InGaAs 
modulators  to  5  V. 

The  simplest  design  for  the  detector/modulator  is  that  of  the  S-SEED,  operating  at  the 
wavelength  (co-incident  with  the  peak  exciton  absorption  at  zero  applied  field).  It  has  been  shown 
for  GaAlAs  SEED  devices  that  the  optimum  operating  conditional,  for  a  10  V  swing,  is  at  a 
wavelength,  Xj,  6  nm  longer  than  the  exciton  peak. 

An  InGaAs/GaAs  SEED^l  was  fabricated.  It  consisted  of  a  100  periods  of  8.2  nm 
Ino23Gao.77As  wells  and  5.6  nm  GaAs  barriers,  grown  pseudomorphic  to  a  relaxing  InGaAs 
bufferPl,  which  formed  the  intrinsic  region  of  a  pin  diode.  From  the  measured  performance  of  a 
InGaAs/GaAs  diode^l  using  the  method  which  minimises  the  system  power,121  the  optimum 
performance  for  an  InGaAs/GaAs  SEED  with  a  5V  swing  was  calculated  to  be  at  a  wavelength  Xi 
13  nm  longer  than  the  exciton  peak,  see  figure  l.This  device  was  designed  to  operate  at  and 
the  exciton  weakens  considerably  at  high  fields,  see  figure  2,  degrading  the  performance  at  Xj. 
This  is  believed  to  be  due  to  the  poor  confinement  of  holes  resulting  from  shallow  valence  band 
wells  of  around  90  meV.  The  addition  of  aluminium  to  the  barrier  increases  the  confinement  of  the 
holes,  with  15%  A1  giving  the  same  confinement,  (127  meV)  as  for  GaAs/Alo3oGao7QAs  wells. 
The  design  of  the  new  device  had  95  periods  of  8.8  nm  Ino.23GaQ77As  wells  and  6.2  nm 
AIq  ijGaogjAs  barriers  forming  the  intrinsic  region.  The  results  on  the  performance  of  the  devices 
with  A1  barriers  optimised  for  Xj  operation  will  be  presented. 

The  hybrid  detector/modulator  which  a  SEED  device  represents  results  in  a  compromise 
between  detector  and  modulator  design.  Ideally  the  detector  should  be  operated  at  Xq  where  the 
absorption  is  high  and  with  fast  sweep-out  compared  to  the  long  recombination  time  to  give  a 
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high  quantum  efficiency  .  The  modulator  should  be  operated  at  Xj  to  maximise  the  modulation 
contrast,  minimise  the  insertion  loss  and  have  low  quantum  efficiency  to  minimise  the  heat 
generated  by  the  photo-current  and  the  size  of  the  CMOS  drive  stage.  In  order  to  fulfil  these  two 
conflicting  requirements,  it  is  advantageous  to  independently  optimise  the  detector  and  the 
modulator.  We  recently  successfully  demonstrated  such  a  device  in  GaAlAsI^l,  by  growing  a 
modulator  on  top  of  a  detector  and  removing  the  modulator  layer  to  expose  the  detector.  The 
device  is  illuminated  such  that  the  light  to  be  detected  does  not  pass  through  the  modulator  or  the 
substrate.  This  technique  cannot  be  replicated  with  a  device  that  is  flip-chip  bonded  (such  as 
InGaAs)  due  to  the  inverted  geometry.  To  expose  the  detector  would  require  undercutting  the 
modulator  and  mirror.  The  modulator  layer  can  be  designed  such  that  it  is  transparent  at  zero  Was. 
In  this  case  when  the  structure  is  used  as  a  detector,  light  passes  through  the  transparent 
modulator  layer.  In  order  to  use  the  structure  as  a  modulator  it  is  necessary  to  remove  the 
detector  section  and  deposit  a  mirror  on  the  modulator  as  shown  in  figure  5. 

The  band-gap  of  the  detector  layer  must  be  smaller  than  the  energy  of  the  incident  photons, 
see  figure  4,  and  therefore  is  tolerant  to  wavelength.  It  has  shallow  wells  to  give  a  high  quantum 
efficiency  at  low  bias  fields.  Instead  the  modulator  layer  has  a  zero  applied  bias  band-edge  (V=0) 
at  shorter  wavelength  than  the  operating  wavelength  so  it  is  transparent  in  the  detector  device. 
When  used  as  a  modulator  it  is  pre-biased  (V2)  to  bring  the  exciton  peak  close  to  the  operating 
wavelength.  This  maximises  the  modulation  depth  available  from  the  voltage  swing  (V1-V2).  The 
modulator  is  designed  with  high  barriers  to  ensure  the  exciton  does  not  broaden  at  high  fields  and 
to  minimise  the  photo-current.  A  short  non-radiative  carrier  lifetime  in  the  modulator  would 
ensure  low  quantum  efficiency  and  suitability  for  high  power  operation. 

As  a  comparison  to  SEED  devices,  we  can  use  a  figure  of  merit  equivalent  to  that  of  [2] 
given  by 

)Ld  in  pm 

where  and  are  the  high  and  low  reflectivity  of  the  modulator,  T^f  the  un-Wased 
transmission  of  the  modulator  region,  a  £,  the  absorption  coefficient  of  the  detector  and  Lp  the 
length  of  the  detector.  The  value  of  this  figure  of  merit  can  be  calculated  using  the  data  for  the 
InGaAs/GaAs  diode,  for  a  detector  at  the  exciton  peak  and  a  modulator  14  nm  distant  from  the 
peak  with  V2=5  V  and  (Vi-V2)=5  V  and  is  0.45  more  than  twice  that  of  conventional  InGaAs 
SEEDS,  at  0.19  see  figure  1,  and  the  same  as  for  GaAs  bases  SEEDs  with  a  5V  swingPl.  Results 
of  the  performance  of  a  device  with  a  modulator  layer  consisting  of  100  periods  of  8.3nm 
Ino.23G®o.77^s  wells  with  5.8  nm  AlAs  barriers,  and  a  detector  layer  of  100  periods  of  9.0  nm 
Ino,23Gao.77^s  wells  with  6.3nm  GaAs  barriers  will  be  presented. 

*D.  J.Goodwill  is  now  with  the  Department  of  Electrical  and  Computer  Engineering,  University  of 
Colorado,  Boulder,  CO 
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Figure  1 :  Figure  of  merit  for  the 
Ino.23Gao.77As/GaAs  Device  as  a  function  of 
wavelength  and  number  of  quantum  wells 


Figure  2 :  Graph  showing  that  the  exciton 
peak  is  significantly  weakened  at  high  fields 
degrading  performance. 
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Figure  3  :  Separation  of  detector  and 
modulator. 


Figure  4  :  Schematic  of  wavelengths  of  band 
edges  for  detector  and  modulator. 
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with  7.2  femtojoule  external  optical  input  energy 
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Most  optoelectronic  switches  are  characterized  by  a  trade-off  between  the  optical  input  sensitivity,  the 
operation  frequency  and  the  area  on  chip.  Fast  operation  usually  occurs  at  the  expense  of  sensitivity,  or 
else  requires  considerable  chip  area  for  fast  amplification  in  several  stages  of  the  input  signal.  It  has 
recently  been  shown  that  specially  designed  optical  thyristors,  called  depleted  thyristors,  are  not  subject  to 
this  trade-off  [1].  The  thyristor  layer  stmcture  must  be  conceived  such  that  the  device  can  be  depleted  of 
carriers  by  a  negative  anode-to-cathode  voltage  pulse.  Such  structure  has  intrinsically  high  speed 
capabilities,  which  can  be  combined  with  extreme  optical  input  sensitivity  by  using  differential  pairs  of 
thyristors  instead  of  single  thyristors  [2].  The  differential  pair  (Fig.  1)  consists  of  two  thyristors  A  and  B 
connected  in  parallel,  which  have  a  common  series  resistance  Rc  [3].  When  thyristor  A  is  on  and  thyristor 
B  is  off,  the  differential  switch  is  in  the  "1"  state;  with  A  off  and  B  on  the  switch  is  in  the  "0"  state.  The 
thyristor  in  the  on-state  emits  light.  This  allows  cascaded  operation  using  the  same  type  of  optical 
thyristor  pair  both  for  the  emitting  and  the  receiving  side  of  optical  interconnects,  or  for  optical 
computing. 

In  this  paper,  we  present  the  fastest  thyristor  optoelectronic  switch  reported  to  date,  and  at  the  same  time 
demonstrate  experimentally  that  the  bitrate  transmitted  by  differential  thyristor  switches  can  be  increased 
without  penalty  of  optical  input  sensitivity.  Our  thyristor  layer  structure,  grown  by  MBE  on  an  intrinsic 
GaAs  substrate,  consists  of:  1  pm  3X101®  cm"®  p-type  GaAs,  150  nm  3X10l®  cm'®  p-type 
Alo.3oGao.7oAs,  130  nm  2X10^^  cm"^  n-type  GaAs,  710  nm  1.4X10i^  cm'®  p-type  GaAs,  200  nm  3X10^® 
cm'®  n-type  Alo.ioGao.9oAs.  This  thyristor  structure  is  designed  such  that  the  device  switches  on  and  off 
with  small  voltage  levels:  -3.5  V  to  -4.0  V  is  sufficient  for  turn-off,  while  the  break-over  voltage  is  +2.7  V 
(see  Fig.  2).  We  make  monolithic  differential  pairs  consisting  of  two  thyristors  of  20X30  pm®  each,  with 
a  series  resistance  of  800  D.  The  total  area  consumed  by  such  a  differential  pair  including  the  series 
resistance  is  60X45  pm®. 
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Fig.  1 :  Differential  thyristor  switch. 


voltage  (V)  3.5  V 


Fig.  2:  Current-voltage  characteristics  of  the  thyristors 
and  the  series  resistance  Rc  of  Fig.  1,  showing  the 
operation  points  of  the  winner  and  of  the  loser. 


The  pulse  train  applied  on  the  thyristor  pair  is  shown  in  the  top  panel  of  Fig.  3.  Each  pulse  consists  of 
three  phases.  First,  the  voltage  is  set  to  -3.5  V  ...  -4  V  during  5  ns.  This  “reset”  pulse  is  sufficient  for 
extracting  all  free  carriers  from  the  center  p-type  and  n-type  GaAs  layers,  such  that  the  thyristors  keep  no 
memory  of  their  previous  state.  Then,  the  voltage  is  ramped  up  to  -1-3.5  V.  During  this  ramp,  the 
thyristors  of  the  differential  pair  are  given  optical  inputs  (second  and  third  panel  of  Fig.  3)  emitted  by 
thyristors  A’  and  B’  with  identical  structure  and  size  as  the  thyristors  A  and  B  of  the  pair  (in  order  to 
demonstrate  cascaded  operation).  The  third  phase  is  the  switch-on  phase:  the  voltage  is  kept  above  2.7  V 
(the  break-over  voltage)  during  5  ns.  The  thyristor  of  the  pair  which  has  received  an  optical  input  then 
switches  on  (the  winner),  while  the  other  thyristor  of  the  pair  (the  loser)  remains  off.  The  state  of  the 
winner  and  of  the  loser  are  shown  in  Fig.  2.  As  shown  in  Fig.  3,  the  optical  inputs  are  provided  in  the 
order  AABBBAABBB,... 
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Fig.  3:  Applied  voltage  pulses  on  the  differential  pair  (A-i-B),  on  the  light-emitting  thyristor  A’  illuminating 
thyristor  A  and  on  the  light-emitting  thyristor  B’  illuminating  thyristor  B. 
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The  frequency  of  the  pulses  shown  in  Fig.  3  was  varied  by  changing  the  ramp  time  of  the  second  phase. 
The  external  optical  energy  necessary  for  correct  switching  was  measured  as  a  function  of  the  frequency 
of  the  pulses.  Fig.  4  shows  the  result.  The  maximum  frequency  reached  is  50  MHz,  corresponding  to  50 
Mbit/sec  operation  of  the  differential  optoelectronic  switch.  The  external  optical  energy  is  7.2  fj  ^  0.5  fl, 
corresponding  to  12  attoJoule/pm^.  Importantly,  this  energy  is  constant,  independent  of  the  frequency. 
This  is  a  result  of  the  application  of  reset  pulses  to  clear  the  thyristors’  state  before  application  of  the  light 
input.  The  performance  of  our  differential  thyristor  switches  is  to  date  limited  by  the  light-emission 
efficiency  of  the  thyristors,  and  it  can  also  further  be  enhanced  by  scaling  down  the  area  of  the  thyristors 
of  the  receiving  pair  and  by  decreasing  the  pair's  series  resistance. 


Fig.  4:  The  external  optical  input  energy  of  our  differential  thyristor 
switch  is  7.2  fJ,  independent  of  the  bitrate  up  to  50  Mbit/sec. 


In  conclusion,  we  present  cascadable  optoelectronic  switches  with  a  total  area  of  60X45  jim^  capable  of 
transmitting  digital  optical  information  at  50  Mbit/sec  with  12  attoJoule/pm^  external  optical  input 
energy.  This  compares  very  favorably  to  other  reported  optoelectronic  switches  such  as  the  resonant- 
detection/resonant-emission  VSTEP  [4],  which  needs  400  aJ/pm^  below  4  Mbit/sec  (and  4000  aJ/pm^  at 
12  Mbit/sec)  and  the  FET-SEED,  the  optical  energy  of  which  is  reported  to  be  1630  aJ/pm^  at  200 
Mbit/sec  [5],  rapidly  increasing  with  increasing  frequency. 
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Demonstration  of  2-dimensional  data  transcription 
between  8x8  arrays  of  completely-depleted  optical  thyristors. 
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Differential  pairs  of  PnpN  optical  thyristors  are  among  the  most  promising  devices  for 
digital  parallel  optical  information  processing  systems.  Very  recently  it  has  been  shown  that 
these  AlGaAs  based  detector-emitter  type  of  devices  can  be  cascaded  at  50  MHz  with  only  7.2 
fj  external  optical  input  energy  [1],  and  that  this  optical  input  energy  scales  with  device  area 

resulting  in  a  record  sensitivity  of  15  aJ/pm^  [2].  The  layer  structure  of  these  optical  thyristors 
has  been  designed  such  that  all  free  carriers  can  be  extracted  from  the  center  p-  and  n-type 
layers  by  applying  a  small  negative  anode-to-cathode  voltage  pulse  (complete  depletion  for  a 
turn-off  voltage  of  -  4V),  while  switching  on  the  device  requires  a  break-over  voltage  of  2.7  V. 
To  be  of  use  in  practical  systems  the  device  has  also  been  engineered  to  be  cascadable.  In  this 
way,  an  optical  thyristor  operating  as  a  detector  is  sensitive  to  the  light  that  an  identical 
element,  working  as  an  emitter,  generates.  Moreover,  a  differential  pair  configuration  has  been 
adopted  to  increase  the  optical  input  sensitivity  [3].  Finally,  correct  operation  of  8*8  monolithic 
arrays  (see  Fig  la)  of  these  completely  depleted,  cascadable  differential  pairs  has  been 
demonstrated  [4]. 

In  this  paper  we  study  the  practical  implementation  of  these  optical  thyristor  arrays  in 
systems.  We  demonstrate  for  the  first  time  the  transcription  of  optical  data  between  arrays  of 
completely-depleted  optical  thyristors.  This  basic  digital  parallel  optical  data  communication  is 
required  for  all  systems,  whether  these  elements  are  to  be  implemented  as  two  dimensional 
optical  logic  planes  for  digital  optical  computing  purposes,  or  as  smart  pixels  when  flip-chipped 
to  VLSI  Silicon  circuitry,  or  simply  as  transceivers  for  interconnecting  multi  chip  modules. 

Fig.  2  shows  the  lay-out  of  the  two-dimensional  data  transcription  demonstrator.  The 
set-up  consists  of  two  8x8  arrays  (array  1  and  array  2)  imaged  from  one  to  the  other  via  a 
compact  optical  system  formed  from  two  0.2  pitch  5mm  diameter  Gradient  Refractive  INdex 
rod  lenses  (GRIN)  and  a  10mm  cube  splitter.  Each  differential  pair  of  the  array  consists  of  two 

identical  thyristors  with  dimensions  of  30x45  pm  squared.  The  cube  splitter,  placed  between  the 
two  lenses,  allows  the  input  of  data  from  an  auxiliary  plane,  containing  either  a  single 
electronically  addressable  optical  thyristor  pair  (see  Fig  lb),  a  mask  illuminated  by  a  near- 
infrared  light  emitting  diode  or  a  spatial  light  modulator  with  near-infrared  backlighting.  In 
addition  it  allows  the  inspection  of  the  data  content  of  array  2  using  a  charge  coupled  device 
camera  (CCD2) .  To  input  data  from  the  input  plane  to  array  1,  a,n  optical  system  is  used  that  is 
similar  to  the  one  that  is  interconnecting  array  1  and  array  2.  Via  CCDl  the  input  data  can  be 
inspected,  while  CCD3  allows  the  data  content  of  the  first  thyristor  plane  to  be  viewed. 
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Both  the  individual  thyristor  and  the  arrays  are  controlled  by  a  reset-sense-switch  voltage 
sequence  (-6V/0V/+8V),  generated  with  commercially  available  synchronized  arbitrary 
waveform  generators. 

With  this  set-up  we  have  first  demonstrated  that  it  is  possible  to  change  the  state  of  each 
individual  thyristor  pair  of  array  1  by  scanning  it  with  the  input  of  the  single  optical  thyristor 
from  the  input  plane.  We  then  transferred  data  from  array  1  to  array  2  and  back.  At  present  we 
are  building  a  dedicated  modular  platform  to  demonstrate  the  compact  nature  of  the  prototype 
system,  in  which  optical  data  input  will  be  provided  via  a  liquid  crystal  spatial  light  modulator. 
Further  results  of  parallel  data  transcription  operations  will  be  presented,  together  with  more 
detailed  measurements  of  accuracy,  speed  and  bit  error  rates.  From  these  we  will  project  future 
system  performances  and  discuss  the  perspectives  for  processing  architectures  based  on  optical 
thyristors. 
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Fig.  la  An  8x8  array  of  optical  thyristor  differential  pairs.  Fig.  lb  A  single  electronically 
addressable  optical  thyristor  pair. 
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Although  backward  error  propagation  learning  in  pho¬ 
torefractive  crystals  has  been  previously  investigated  by 
simulation  and  experiment,  theoretical  results  governing 
convergence  have  been  lacking.  In  this  paper  we  prove  an¬ 
alytically  that  such  learning  in  multilayer  neural  networks 
implemented  using  photorefractive  crystals  can  have  sim¬ 
ilar  convergence  properties  to  those  of  an  ideal  backward 
error  propagation  network.  Further,  we  derive  relation¬ 
ships  between  two  learning  parameters  that  will  ensure 
these  convergence  properties  are  satisfied  under  the  as¬ 
sumption  of  small  weight-update  sizes,  and  we  relate 
these  parameters  to  spatial  light  modulator  gain  and  holo¬ 
graphic  grating  update  exposure  energy. 

Artificial  neural  networks  “learn”  by  adjusting  their  in¬ 
terconnection  weights  based  on  a  prescribed  learning  pro¬ 
cedure.  A  large  class  of  these  learning  procedures  have 
weight  updates  that  correspond  to  an  outer  product  be¬ 
tween  an  input  vector  and  a  training  vector.  For  back¬ 
ward  error  propagation  learning  [1]  this  weight  update  is 
given  by 

AW^^=a6^yf-^,  (1) 

in  which  is  the  interconnection  strength  between 

neuron  j  in  layer  I  —  1  and  neuron  i  in  layer  /,  a  is  the 
learning  rate  parameter,  ^  is  the  backward  propagating 
error,  and  is  the  forward  propagating  signal.  The 

forward  propagating  signal  is  given  by  = 

which  /  is  the  neuron  activation 
function  and  p  is  the  neuron  potential.  The  neuron  acti¬ 
vation  function  is  generally  a  soft  threshold;  in  this  paper, 
we  use  f{p)  =  1/(1  -  exp(-4p)).  For  the  remainder  of 
this  paper,  the  layer  superscript  will  be  dropped  from 
equations  for  which  the  layer  relationships  are  clear.  The 
physical  realization  of  a  large  neural  network  with  learn¬ 
ing  capabilities  requires  a  large  number  of  continuously 
modifiable  interconnections.  Photorefractive  materials, 
when  used  as  the  interconnection  medium  in  an  optical 
implementation  of  a  network,  can  provide  a  large  number 
of  such  continuously  modifiable  interconnections  with  up¬ 
dates  in  the  form  of  an  outer  product. 

In  this  effort,  under  certain  approximations  and  as¬ 
sumptions  (detailed  below)  we  have  derived  the  neural- 
space  weight  updates  for  two  classes  of  optical  architec¬ 
tures.  In  the  first,  illustrated  in  Fig.  1,  a  single  coherent 
source  (SCS)  is  used  for  both  the  compute  and  the  update 
phases;  signals  are  represented  by  electric  field  amplitudes 
of  corresponding  plane  waves  that  effect  the  optical  in¬ 
terconnection.  In  the  second,  illustrated  in  Fig.  2,  an 
array  of  mutually  incoherent/individually  coherent  (I/C) 
sources  are  used  for  both  the  compute  and  update  phase; 
signals  are  represented  by  intensities  of  the  corresponding 
plane  waves  that  effect  the  optical  interconnection  [2]. 


Figure  1;  Single  Coherent  Source  Architecture.  A  single  co¬ 
herent  source  illuminates  both  spatial  light  modulators  (SLM^ 
and  SLMy)  such  that  when  shutter  SI  is  open  the  intensity 
pattern  within  the  photorefractive  crystal  (PRC),  caused  by 
the  interference  of  light  from  SLM^  with  the  light  from  SLMy, 
modifies  the  stored  holographic  gratings.  In  the  forward  prop¬ 
agation  mode,  shutter  Si  is  closed,  and  the  light  from  SLMy 
is  diffracted  by  amounts  proportional  to  the  interconnection 
weights  forming  a  coherent  sum  on  each  detector  array  ele¬ 
ment. 


We  assume  the  following:  each  interconnection  grating 
is  formed  by  the  interference  of  two  plane  waves;  all  of  the 
gratings  completely  overlap;  all  of  the  light  that  does  not 
contribute  directly  to  the  update  of  a  grating  is  treated 
as  incoherent  background  illumination;  the  background 
illumination  is  constant;  the  SLM’s  are  sampled  in  such 
a  way  as  to  avoid  grating  degeneracy  [3];  all  other  cross 
talk  is  ignored;  the  index  modulations  of  the  individual 
gratings  are  small;  the  charges  within  the  crystal  move  in 
accordance  with  Kukhtarev’s  single-active-species  charge 
transport  model  [4];  charge  is  transported  by  diffusion 
only;  the  interconnection  gratings  are  phase  only  (no  ab¬ 
sorption  modulation);  the  update  time  is  significantly  less 
than  the  photorefractive  time  constant;  and  the  spatial 
light  modulators  (SLMs)  modulate  only  the  amplitudes 
of  the  signals.  In  terms  of  the  neural  signals,  it  is  as¬ 
sumed  that  0  <  |^i|  <  1  and  0  <  yj  <  1,  in  which  the 
conditions  |^i|  =  1  and  yj  =  1  each  correspond  at  the 
physical  level  to  the  maximum  transmission  or  reflection 
state  of  the  corresponding  SLM. 

Under  these  assumptions,  the  neural  level  weight  up¬ 
date  for  the  SCS  class  of  architectures  has  been  shown  in 
[5]  to  be 

AWij^a  yi{6t  -  Sj)  -  0^,  (2) 

Outer  Product  Decay 

in  which  (3  is  the  decay  rate  coefficient,  =  (l/2)(|^t|  + 
6i)  and  6^  =  (l/2)(|^i|  -  6i).  Similarly,  the  neural  space 
weight  update  for  the  I/C  class  of  architectures  can  be 
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Figure  2:  Incoherent/Coherent  Architecture.  In  this  archi- 
tecture^  SLMy  is  placed  in  the  image  plane  of  an  array  of  indi¬ 
vidually  coherent  but  mutually  incoherent  sources  and  SLM^ 
is  plac^  in  the  Fburier  plane  of  this  source  array.  The  light 
from  pixel  i  of  SLMg  has  an  equal  component  from  each  source; 
therefore,  when  a  holographic  grating  (interconnection)  is  up¬ 
dated  only  a  small  fraction  of  the  light  (corresponding  to  the 
coherent  component)  from  this  pixel  contributes  to  the  up¬ 
date.  In  the  forward  propagation  mode,  shutter  SI  is  closed, 
and  the  light  from  SLMy  is  diffracted  by  amounts  proportional 
to  the  interconnection  weights  forming  an  incoherent  sum  on 
each  detector  array  element. 


AWij  —  (X 


Outer  Product 


I  _ „  ,  _  X 10"  I 

Figure  4:  Pc»tion  of  region  of  convergence  corresponding  to  a 
small  weight  update  step  size  for  (a)  SCS  class  of  architectures 
(b)  I/C  class  of  architectures. 


Figure  3:  Dual  rail  encoding  for  a  bipolar  interconnection 
for  a  unipolar  output.  The  effective  bipolar  weight,  Wij  = 
is  computed  in  the  neuron  unit. 


shown  to  be 


\nw\nchWij^U±-Ur.^ajidUtj  >  0  are  the  two  unipo¬ 
lar  components  of  the  bipolar  weight  Wij.  In  both  archi¬ 
tectures,  bipolar  weights  are  implemented  using  “dual- 
rail”  encoding  as  illustrated  in  Fig  3. 

The  form  of  these  updates  is  not  entirely  consistent 
with  the  ideal  outer  product  form  in  Eq.(l)  that  is  com¬ 
mon  to  many  neural  network  learning  algorithms.  In  both 
cases  the  weight  update  includes  a  decay  term  correspond¬ 
ing,  at  the  physical  level,  to  the  partial  erasure  of  previ¬ 
ously  written  gratings  [5],  [6].  In  this  effort,  we  derive 
the  convergence  relationship  for  the  backward  error  prop¬ 
agation  learning  algorithm  in  which  the  weight  update  is 
governed  by  the  properties  of  photorefr active  crystals,  as 
modeled  by  Eqs.  (2)  and  (3). 

Backward  error  propagation  uses  gradient  descent  to 
minimize  the  global  error  function,  Jo,  given  by 


Figure  5:  Complete  region  of  convergence  for  the  1 
of  architectures 


in  which  s^^'^is  input  training  vector,  is  the  de¬ 
sired  output  for  input  vector  y^^^(s^’”^)  is  the  actual 
neural  network  output  for  input  vector  ,  and  W  is  a 
vector  of  all  weights  in  the  network.  The  backward  prop¬ 
agating  error  signal  that  minimizes  this  error  function  is 
given  by  [1] 


/vr'o 


(0 /•„(»»>)  \ 


\<1<L, 


j,W(s(-))l /'(^f  >(s("»))  l  =  L, 


in  which  L  is  the  output  layer.  The  convergence  criterion 
for  learning  is  generally  that  Jo  must  be  less  than  a  pre¬ 
defined  threshold;  the  region  in  weight  space  for  which 
this  condition  is  satisfied  will  be  denoted  by  Wc.  Here¬ 
after,  we  will  assume  that  there  are  no  local  minima  not 
contained  in  Wc. 

In  the  limit  of  small  step  size  (||AW||  small,  in  which 
AW  is  the  composite  vector  of  all  weight  updates  over 
all  connections  in  the  network),  a  necessary  (except  at 
local  extrema  that  are  not  local  minima)  and  sufficient 
condition  to  ensure  that  the  global  error  function  (Jo)  is 
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architectures 


reduced  at  each  iteration,  and  a  sufficient  condition  to 
ensure  convergence,  can  be  shown  to  be 

(AW)’’  (-VJo(W))  >0  VW  Wc.  (6) 

For  the  SCS  class  of  architectures,  this  convergence  con¬ 
dition  can  be  obtained  by  substituting  Eq.  (2)  into  Eq. 
(6)  which  gives 


V  E, )' 


(7) 


in  which  Wmax  is  the  maximum  achievable  weight  for  a 
given  a  and  p.  Similarly,  this  convergence  condition  can 
be  obtained  for  the  I/C  class  of  architectures  by  substi¬ 
tuting  Eq.(3)  into  Eq.  (6)  which  gives 


(8) 


r  u+  6i  >o 

\ur.  6i<0 


.  (9) 


The  relationships  between  the  learning  parameters  a  and 
p  in  each  of  Eqs,  (7)  and  (8)  define  the  Region  of  Conver¬ 
gence  (ROC)  of  the  backward  error  propagation  learning 
algorithm  in  learning  parameter  [{ot,P)]  space.  In  both 
cases  the  lower  boundary  of  the  ROC  is  a  line  through 
the  origin  in  {oi^p)  space;  this  line  is  defined  by  the  a’s 
and  p^e  for  which  Eqs.  (7)  and  (8)  are  equalities.  For  a 
given  a  and  p  the  distance  from  the  origin  [(a^ 
is  proportional  to  the  step  size  of  the  weight  update. 

We  empirically  generated  the  ROC’s  by  using  the 
weight  updates  of  Eqs.  (2)  and  (3)  to  solve  the  XOR 
sample  problem  (with  a  2:3:1  network).  Figure  4  contains 
plots  of  the  ROC’s  for  the  region  in  {a,P)  space  for  which 
the  assumptions  leading  to  Eqs.  (7)  and  (8)  are  valid 
(small  step  size).  The  lower  boundaries  of  these  ROC’s 
are  approximately  linear,  in  agreement  with  theoretical 
predictions.  In  a  given  simulation  the  network  is  said  to 
converge  if  the  number  of  iterations  required  to  satisfy 
the  convergence  criteria  for  learning  (given  after  Eq.  (5) 


above)  is  below  a  predefined  maximum  number  of  itera¬ 
tions.  Because  this  number  is  finite  there  will  always  be  a 
finite  minimum  step  size  required  for  simulation  conver¬ 
gence  as  evidenced  in  the  empirically  generated  ROC’s 
(a  simulation  artifact).  The  graphically  measured  slope 
of  the  lower  boundary  of  the  ROC  for  the  SCS  class  of 
architectures  is  approximately  95  and  for  the  I/C  class  of 
architectures  approximately  17  for  this  particular  prob¬ 
lem. 

The  slope  of  the  line  defining  the  lower  boundary  of  the 
ROC  in  learning  parameter  space  determines  the  min¬ 
imum  SLM  gain  in  an  optical  implementation  that  is 
required  for  reliable  convergence  of  the  network  during 
learning,  as  follows.  The  gain  required  to  realize  a  maxi¬ 
mum  possible  weight  Wmax  =  ol/P  in  the  SCS  class  of  ar¬ 
chitectures  can  be  shown  to  be  G  =  (9W^ax^^)/ (4r/max), 
in  which  N  denotes  the  number  of  neurons  in  each  layer 
(for  simplicity  we  have  assumed  that  all  layers  have  the 
same  number  of  neurons)  and  r/max  is  a  function  of  the 
saturation  intensity  diffraction  efficiency  [7],  [8].  The 
SLM  gain  required  in  the  I/C  class  of  architectures  to  re¬ 
alize  a  maximum  weight  of  Wmax  =  (a//?)^  can  be  shown 
to  be  G  -  {mmaxN^)l{^r]max). 

The  complete  ROC’s  in  learning  parameter  space  are 
shown  in  Figs.  5  and  6.  Our  simulations  indicate  that 
as  the  step  size  increases  and  the  assumptions  leading 
to  Eqs.  (7)  and  (8)  are  violated  the  error  function  no 
longer  decreases  monotonic  ally,  thus  at  times  preventing 
convergence.  The  roughness  in  the  boundary  of  the  ROC 
is  indicative  of  this  behavior.  The  step  size  (||AW||)  is 
directly  proportional  to  the  maximum  exposure  energy 
used  in  an  interconnection  update.  Therefore,  in  order 
to  ensure  convergence  of  an  optical  implementation  both 
the  SLM  gain  and  exposure  energy  must  be  chosen  in 
such  a  way  that  the  corresponding  learning  parameters 
fall  within  the  ROC. 

This  work  was  supported  in  part  by  AFOSR  (Grant 
No.  F49620-93-1-0455)  and  ARPA  (Grant  No.  F49620- 
92-J-0472) 
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Introduction 

Optical  resonators  are  a  powerful  tool  for  object 
classification.  Using  a  volume  hologram  to  store  and 
simultaneously  probe  thousands  of  reference  images, 
they  can  take  an  input  image  and  find  the  best  matched 
reference  object  from  a  vast  stored  library. 
Developments  in  this  field  are  proceeding  rapidly  in 
numerous  laboratories^'^.  These  research  efforts  are 
confined,  however,  to  all-optical  resonator  systems 
which  suffer  from  the  speed  and  gain  limitations  of 
photorefractives,  and  from  the  lack  of  shift  invariance  of 
the  inner-product  function  performed  when  the  input 
image  is  used  to  read  a  volume  hologram. 

We  are  investigating  a  Hybrid  Electro-Optic  Resonator 
for  Image  Classification  (HEORIC),  which  retains  the 
large  number  of  independent  reference  objects  of  the  all- 
optical  systems,  but  scales  to  much  higher  speeds,  and 
performs  shift  invariant  recognition.  The  independent 
references,  which  may  number  more  than  1000,  are 
stored  in  a  volume  hologram  as  fixed  angularly 
multiplexed  holograms^.  Shift  invariance  is 
accomplished  by  using  these  reference  images  as  one  of 
the  inputs  to  a  correlator,  the  other  input  being  the 
image  to  be  recognized.  Speed  is  obtained  by  using 
dynamic  variables  whose  temporal  change  is  influenced 
only  by  electrical  and  acoustooptic  time  constants  and 
not  by  photorefractive  or  spatial  light  modulator  time 
constants.  This  allows  the  system  to  perform  image 
classification  on  the  microsecond  time  scale  rather  than 
the  millisecond  scale  required  of  the  all-optical 
approaches. 
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System  description 

The  system  is  shown  in  figure  1.  The  correlator  is  the 
right  most  portion  of  the  resonator.  The  input  image  is 
Fourier  transformed  and  interferometrically  detected  on 
the  write  side  of  the  optically  addressed  spatial  light 
modulator  (OASLM).  The  recorded  pattern  is  multiplied 
by  the  Fourier  transform  of  images  from  the  stored 
patterns  in  the  volume  hologram.  The  product  is  inverse 
transformed  giving  the  correlation  at  the  "object 
position"  CCD  output.  This  portion  of  the  system 
constitutes  a  standard  correlator.  This  is  used  in  the 
hybrid  electro-optic  resonator  for  image  classification 
(HEORIC)  to  simultaneously  correlate  the  incoming 
signal  with  a  bank  of  reference  objects. 

The  system  shown  in  figure  1  is  a  positive  feedback 
loop.  We  use  a  broad  band  comb  of  frequencies  to 
initially  excite  the  Bragg  cell,  diffracting  a  small 
amount  of  optical  power  into  each  of  the  resonator 
modes.  Each  of  these  deflected  beams  carries  a  Doppler 
shift  proportional  to  angle  and  of  the  same  frequency 
used  to  drive  the  Bragg  cell.  Each  beam  from  the  Bragg 
cell  reads  a  different  volume  hologram,  diffracting  out 
the  reference  image  corresponding  to  that  read  angle,  and 
each  of  which  is  still  oscillating  at  the  Doppler 
frequency  shift  used  to  read  that  particular  image.  These 
frequency  shifts  become  "tags"  keeping  track  of  the 
separate  images.  The  images  are  all  simultaneously 
Fourier  transformed  onto  the  OASLM  where  they 
multiply  by  the  Fourier  transform  of  the  system  input. 
This  performs  the  correlation  of  the  input  with  all 
possible  reference  class  images  in  parallel.  This  product 


Figure  1 .  Electrooptic  resonator 
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is  inverse  Fourier  transformed  onto  a  segmented 
heterodyne  photo  detector,  forming  the  correlation 
peaks.  Since  the  images  were  tagged  with  the  Doppler 
shift  frequencies  from  the  Bragg  cell  the  correlation 
peaks  have  those  same  Doppler  shifts.  A  custom 
segmented  heterodyne  detector  is  used  to  reconstruct  the 
original  frequencies  with  amplitudes  proportional  to  the 
strength  of  the  correlation.  These  signals  are  fed  back  to 
the  Bragg  cell  which  reconstructs  the  volume  hologram 
readout  waves  with  updated  amplitude  weighting.  The 
strongest  correlation  peaks  produce  strong  heterodyne 
frequency  components  that  read  their  own  reference  class 
images  with  an  increased  strength. 

The  modes  in  the  resonator  compete,  and  the  ones 
which  have  the  greatest  correlation  grow  faster  due  to 
the  increased  feedback,  quickly  consuming  the  optical 
power  available  to  the  Bragg  cell.  In  saturation,  the 
power  in  the  most  strongly  correlated  mode  suppresses 
the  remaining  modes  by  using  up  the  Bragg  cell  input 
optical  power.  In  steady  state,  almost  all  of  the  optical 
power  is  in  the  resonator  mode  corresponding  to  the 
strongest  correlation  peak.  The  recognition  system  takes 
advantage  of  this  state  with  the  use  of  two  position 
detectors  shown  as  CCDs  in  figure  1.  The  first  CCD 
samples  light  split  off  from  the  Bragg  Cell  in  the 
Fourier  plane.  The  position  of  the  focused  spot  in  this 
plane  is  the  class  of  the  object.  The  second  CCD 
samples  light  split  off  from  the  correlation  plane  of  the 
image.  The  position  of  the  dominant  focused  spot  in 
this  plane  is  the  object  position  within  the  input  frame. 

Single  signal  analytic  results 

Stability  of  the  winning  resonator  mode  is  achieved 
by  providing  sufficient  small  signal  round  trip  gain  to 
make  the  off  state  unstable,  and  by  providing  a 
saturation  to  the  gain.  The  diffraction  efficiency  in 
acoustooptic  devices  saturates  as  it  approaches  full 
deflection  of  the  incident  optical  power.  For  multiple 
input  signals,  this  saturation  of  the  strongest  signal  is 
accompanied  by  a  suppression  of  the  remaining 
signals^.  This  is  the  behavior  required  for  mode 
competition  to  allow  the  system  to  converge  on  the 
single  reference  image  best  matched  to  the  input  image. 

Under  the  approximation  that  a  single  frequency 
dominates  the  power  in  the  resonator,  the  acoustooptic 
response  to  that  single  frequency  is  given  by  sin(Vi), 
where  Vj  is  the  Bragg  cell  input  amplitude  at  this 
frequency.  This  normalized  expression  gives  unity  for 
the  small  signal  gain. 

Stability  for  nonlinear  feedback  systems  is  illustrated 
with  a  plot  of  the  transfer  function  of  half  the  system 
overlaid  with  an  axes-exchanged  plot  of  the  transfer 
function  of  the  other  half.  Since  the  output  of  the  first 


Figure  2.  Stability  and  steady-state  amplitude.  Single 
signal  input-output  relation  for  acoustooptic  Bragg  cell 
with  overlay  of  electrical  output-input  relation. 


half  serves  as  the  input  of  the  second  half,  the  self 
consistent  solutions  for  signal  levels  in  the  system  are 
the  points  where  the  two  plots  intersect.  Stability  of 
these  solutions  requires  that  the  slope  of  the  axes- 
exchanged  plot  is  greater  than  the  slope  of  the  regular 
plot.  Figure  2  shows  this  plot  for  our  system.  The  first 
plot  is  the  plot  of  the  sin(v;)  Bragg  cell  transfer 
function,  the  overlay  with  axes  exchanged  is  the 
feedback  system  with  an  assumed  linear  transfer 
function.  This  linear  plot,  with  the  axes  exchanged,  is 
shown  for  a  strong  and  a  weak  correlation. 

The  time  constant  for  convergence  of  the  system  can 
be  estimated  using  the  exponential  growth  rate  of  power 
in  a  mode.  Assuming  an  initial  acoustooptic  diffraction 
efficiency  of  0.0001%  and  a  small  signal  round  trip  gain 
of  2,  it  takes  only  20  round  trips  to  reach  saturation 
since  0.0001%  x  2®  =100%.  For  a  loop  time  of  one 
microsecond  (typical  aperture  time  for  a  Bragg  cell)  the 
total  time  for  convergence  is  a  little  more  than  20 
microseconds.  This  is  quite  fast  for  the  simultaneous 
full-frame  correlation  with  as  many  as  1000  reference 
class  objects. 

System  dynamics 

The  dynamics  of  the  system  were  modeled  by  tracking 
the  power  in  each  portion  of  the  resonator,  including 
nonlinearities  in  the  acoustooptic  transfer  function,  a 
limiting  amplifier  in  the  electrical  feedback  line,  and 
assuming  that  the  rest  of  the  system  is  operating  in  the 
linear  regime. 

Figure  3  shows  a  typical  simulation  for  10  modes  in 
the  resonator,  with  low  gain.  We  use  random  initial 
power  in  each  mode  in  modeling  the  system  starting 
from  noise  rather  than  the  comb  function  shown  in 
figure  1 ,  since  the  exact  frequency  of  each  mode  is  not 
known  a  priori  due  to  unknown  optical  phase  shifts  and 
thermal  drift.  Each  mode  has  gain  in  proportion  to  the 
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correlation  with  the  input  image,  and  all  modes  have 
additional  loss.  In  this  simulation  3  modes  have  gain  in 
excess  of  loss.  The  mode  with  the  highest  gain, 
representing  the  strongest  correlation,  crosses  into  the 
saturation  region  first,  and  suppresses  the  remaining 
modes. 

In  the  simulation  of  figure  3  the  mode  that  has  the 
strongest  correlation  with  the  input  wins  in  spite  of  it 
having  a  lower  initial  power.  If  the  mode  with  lower 
correlation  with  the  input  image  has  a  random  initial 
power  much  greater  than  the  first  mode,  this  mode  may 
win.  The  power  in  the  modes  initially  grows  as 
V^(/)  =  where  a-  is  the  net  gain  for  the 

mode,  and  VJ(0)  is  the  random  initial  power  in  that 
mode.  The  amplitude  in  the  modes  deviates  from  this 
expression  when  one  of  the  modes  grows  above  unity. 
This  is  when  it  starts  to  suppress  the  remaining  modes. 
The  mode  which  wins  the  competition  for  power  in  the 
resonator  is  the  mode  which  has  the  lowest  time, 
T  =  (l/a.)ln(l/VJ(0)),  to  reach  saturation.  The  logarithm 
in  this  expression  provides  the  system  with  correct 
behavior  for  a  large  range  of  initial  conditions. 

Heterodyne  Detector 

Heterodyne  detection  of  an  entire  correlation  function 
does  not  provide  discrimination  of  strong  and  weak 
correlations  due  to  the  aggregate  power  in  the 
correlation  sidelobes.  Correlations  are  typically 
thresholded  to  provide  a  nonlinearity  for  discrimination. 

In  this  system,  we  could  use  a  segmented  detector, 
followed  by  a  power  law  nonlinearity  and  sum  the 
signals  from  each  segment.  This  sum  must  then  go 
through  the  inverse  power  law,  in  order  to  provide  small 
signal  gain  to  make  the  off  state  unstable. 

An  alternate  technique  is  to  use  a  segmented  detector 
with  only  one  section  of  the  detector  hooked  up  to  the 
feedback  system  at  any  one  time.  This  provides  the 
appropriate  discrimination  since  the  entire  sidelobe 
structure  of  the  correlation  is  not  fed  back  to  the 
system^.  The  resonator  goes  into  oscillation  when  the 
correlation  peak  falls  on  the  active  detector.  The  system 


then  has  a  restricted  field  of  view  corresponding  to  the 
location  of  the  active  detector,  and  the  detector  segments 
are  cycled  on  one  at  a  time. 

Conclusion 

A  new  electrooptic  resonator  for  rapid  image 
classification  versus  a  bank  of  reference  images  has  been 
introduced.  An  angularly  multiplexed  volume  hologram 
is  used  to  store  up  to  1000  images  which  are 
simultaneously  read  out  using  Doppler  shifted  beams 
deflected  by  an  AO  deflector.  All  of  these  images  are 
correlated  against  the  input  simultaneously  to  provide 
heterodyne  detected  gain  coefficients  for  the  feedback 
resonator. 

We  have  modeled  the  system  analytically  and 
numerically.  Simulations  of  the  resonator  show  the 
competition  between  stored  reference  images,  and  the 
winner-take-all  nature  of  the  system.  For  sufficient 
input  image  intensity  and  electronic  gain,  the  system 
off  state  is  unstable,  and  the  full  power  resonant  state 
with  the  proper  recognized  object  is  the  resulting  stable 
state. 
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Introduction 

Fast  analog-to-digital  (A/D)  converters  are  important  in  a  number  of  applications.  Several 
systems  have  been  proposed  for  fast  A/D  converters  using  optical  technology''^.  The  most 
common  types  of  converters  are  the  successive  approximation  and  Flash  converters.  In  a  Flash 
converter  there  is  a  separate  comparator  for  each  possible  output  bit  code.  Each  comparator  is 
biased  with  a  reference  level  that  is  a  specific  increment  of  the  full  scale  value.  Since 
comparators  in  a  Flash  converter  operate  in  parallel,  this  architecture  is  intrinsically  fast. 
However,  as  the  accuracy  requirements  increase,  the  number  of  comparators  increases  as  2^, 
where  N  is  the  number  of  bits.  In  the  case  of  an  8  bit  Flash  ADC,  there  are  256  comparators. 
The  256  comparator  output  signals  are  routed  to  a  decoder  circuit  that  produces  and  8  bit  digital 
word.  The  schematic  for  a  Flash  converter  is  shown  in  Fig.  1.  The  focus  of  this  paper  is  on  a 
new  implementation  of  an  8  bit  Flash  converter  utilizing  optical  technologies. 

Overview  of  Optical  A/D  Converter  Architecture 

The  front  end  of  the  optical  A/D  converter  consists  of  optical  reference  and  input  signals.  These 
signals  are  generated  by  laser  diodes  illuminating  input  and  reference  holograms.  The 
comparison  operation  is  implemented  by  an  opto-electronic  device.  One  possibility  is  to  use 
FET-SEED  devices  as  comparators’.  In  this  case,  the  input  hologram  replicates  the  input  signal 
onto  the  signal  input  diodes  of  256  FET-SEED  comparators;  the  reference  hologram  produces 
256  gray  levels  that  provide  the  trigger  level  for  each  of  the  corresponding  comparators.  The 
digital  signal  value  is  determined  by  the  position  in  the  FET-SEED  array  where  the  output 
changes  from  high  to  low  reflectance.  To  decode  this  position  the  output  of  the  FET-SEED  array 
is  duplicated,  shifted,  and  summed  onto  a  detector  array.  The  signals  from  these  detectors  are 
used  to  drive  an  array  of  vertical  cavity  surface  emitting  lasers  (VCSEL’s).  The  response  of 
these  lasers  is  such  that  only  one  laser  will  be  on,  corresponding  to  the  position  of  transition  in 
the  comparator  array.  Lookup  table  holograms,  one  for  each  "VCSEL,  are  used  to  generate  the 
digital  representation  of  the  signal  level.  Thus,  the  single  laser  that  is  turned  on  will  illuminate  a 
hologram  that  generates  the  correct  digital  bit  pattern.  This  optical  bit  pattern  can  be  detected  by 
photoconductive  detectors  converting  the  signal  to  a  digital  electrical  form.  Figure  2  shows  the 
architecture  of  this  system. 

Computer  Generated  Holograms 

One  of  the  key  issues  for  this  system  is  whether  the  optical  input  and  reference  signals  can  be 
produced  with  sufficient  accuracy  for  this  application.  The  Gerchberg- Saxton  Preconditioned 
Random  Search  (GSPRS)*  method  was  used  to  design  multi-level  phase  holograms  for  this 
application.  In  designing  the  computer  generated  holograms  there  are  several  important 
considerations.  These  include:  the  diffraction  efficiency,  the  accuracy  of  the  intensity  produced 
by  the  hologram,  the  space-bandwidth-product  of  the  hologram,  the  number  of  phase  levels  that 
will  be  used  in  fabricating  the  hologram,  and  the  geometry  of  the  connection  pattern.  The  results 
thus  far  indicate  that  better  than  8  bit  performance  can  be  achieved  by  64  level  phase  holograms 
with  1024x1024  pixels.  The  reconstructed  output  of  the  gray  scale  hologram  is  shown  in  Fig.  3. 

Electro-optic  Comparator 

There  are  several  possibilities  for  the  design  of  an  opto-electronic  comparator.  One  technology 
that  is  currently  available  for  this  purpose  is  the  FET-SEED.  This  device  has  two  integrated 
photodiodes,  a  FET  amplifier,  and  either  one  or  two  multiple  quantum  well  (MQW)  modulators. 
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The  two  photodiodes  coupled  to  the  FET  effectively  act  as  a  differential  amplifier  driving  t  e 
MOW  modulator.  The  modulator  will  either  be  in  a  high  or  low  reflectance  state  depending  on 
which  input  is  higher.  Although  fairly  high  differential  power  levels  are  required  to  yield  a  fast 
response,  this  FET-SEED  device  is  being  investigated  as  a  comparator  to  demonstrate  the  basic 

principle  of  the  system. 

Optical  Decoder 

The  optical  decoder  determines  where  in  the  SEED  array  the  output  reflectance  changes  from 
high  to  low.  This  is  accomplished  by  an  optical  system  that  produces  an  image  of  the  output 
plane  plus  a  shifted  replication  of  the  output.  Essentially  this  is  an  optical  system  with  a  point 
response  consisting  of  two  delta  functions  separated  by  the  spacing  of  the  SEED  elements.  The 
summed  optical  signal  will  illuminate  a  heterojunction  phototransistor  that  drives  a  VCSEL 
array.  The  response  of  the  VCSEL  is  such  that  lasing  will  occur  only  when  one  of  the  summed 
signals  is  high.  The  vertical  cavity  laser  consists  of  a  quantum  well  gain  region  enclosed  by  a 
cavity  and  surrounded  by  quarter  wave  stack  mirrors.  A  micro-lens  array  is  used  to  collimate 
the  light  coming  out  of  each  VCSEL.  The  VCSEL  array  wm  provide  the  illumination  to  an 
array  of  optical  lookup  table  holograms.  Since  only  one  laser  in  the  VCSEL  array  will  be  lasing 
at  a  time,  only  one  hologram  will  be  illuminated.  The  lookup  table  holograms  are  designed  to 
produce  an  8  bit  digital  word  represented  by  high  or  low  intensity  values  in  the  appropriate  bit 
position.  The  8  bit  optical  word  will  illuminate  an  array  of  photoconductive  switches  to  convert 
the  signal  into  digital  electronic  form.  Design  of  the  optical  lookup  table  has  begun,  and  will  be 
tested  with  an  electrically  addressable  VCSEL  array. 

Conclusion 

Modeling  and  experimental  verification  is  being  conducted  on  all  aspects  of  this  converter.  The 
CGH's  have  been  designed  and  sent  out  for  fabrication.  Analyses  of  the  aberrations,  and 
wavelength  dependence  are  being  done  on  CODEV,  and  Zemax.  The  SEEE)  device  is  being 
modeled  in  PSPICE  and  experimentally  tested  as  a  comparator.  Also,  the  optical  lookup  table 
will  be  demonstrated  with  an  electrically  addressable  VCSEL  array  coupled  to  an  array 
holograms. 


Fig.l 
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Fig.  3 
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We  recently  presented  a  rigorous  statistical  analysis  of  a  generic  three-plane  optical 
processor  whose  architecture  is  common  to  a  number  of  information-processing  systems  including 
optical  correlators,  optical  interconnects,  and  optical  linear  algebra  processors  [1,  2].  We 
established  the  statistics  of  the  detector  ouq)ut  voltage  v(0,  which  is  the  sipal  of  ultimate  interest 
for  this  processor,  without  confining  ourselves  to  a  specific  set  of  devices.  In  particular,  we 
found  the  conditional  characteristic  function  of  v(0  to  be  of  the  form 
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where  e  is  the  electronic  charge,  is  the  detector  integration  time,  q  is  the  random  gain  in  the 
photodiode  with  a  probability  density  function  PQiq)^  fit)  is  the  photon-to-voltage  impulse 
response  of  the  detection  and  post-processing  electronics,  r(t)  is  the  stochastic  rate  process  due 

to  the  incident  field,  p  is  the  random  dark  excitation  rate,  and  Gy^  is  the  variance  of  the  Gaussian 

zero-mean  thermal  noise  voltage. 

We  then  proceeded  to  insert  statistical  models  for  popular  optoelectronic  devices  into  this 
general  formalism  in  an  effort  to  obtain  system  output  statistics  for  various  combinations  of 
sources,  modulators,  and  detectors  [2].  In  particular,  we  considered  semiconductor  laser  and 
light-emitting  diodes,  an  ideal  noiseless  spatial  light  modulator  and  a  hypothetical  Gaussian 
random  complex-amplitude  screen,  and  an  ideal  photon  counter  as  well  as  semiconductor  p-i—n 
and  avalanche  diodes.  The  propagation  scenarios  considered  were  the  ideal  geometrical-optics- 
limit  free-space  propagation  and  a  simple  single-lens  imaging  system.  In  most  practical  cases 
of  interest,  the  output  probability  distribution  was  found  to  be  reasonably  close  to  Gaussian. 
Furthermore,  the  noise  at  the  processor  output  was  shown  to  be  signal-dependent  in  all  cases. 
This  dependence  can  be  expressed  in  an  analytical  form  as 

Oy  “  O'fHy  I?THy  C  5 


where  niy  and  (j/  are  the  mean  and  variance  of  the  output  voltage,  which  are  respectively 
associated  with  the  signal  and  noise  portion  of  the  processor  output,  and  a,  b,  and  c  are  constants. 

In  this  paper,  we  shall  report  on  the  potential  accuracy  improvement  offered  by  optimal 
detection-  and  estimation-theoretic  techniques  applied  to  this  general  observation  model  [2]. 
Toward  this  end,  we  shall  start  by  defining  the  computational  accuracy  of  a  processor  as  the 
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signal  resolution  it  affords  at  its  output  while  simultaneously  satisfying  an  average  probability-of- 
error  criterion.  This  signal  resolution  can  be  quantified  in  terms  of  the  maximum  number  of 
identifiable  signal  levels  L  within  the  output  dynamic  range  or,  equivalently,  the 

maximum  number  of  bits  n,  where  n  =  log2(L).  For  a  meaningful  expression  of  processor 
performance,  both  accuracy  (i.e.,  number  of  levels  or  bits)  and  precision  (i.e.,  probability  of  error 
per  level  or  bit  error  rate)  should  be  specified.  It  should  be  intuitively  obvious  that,  within  a 
fixed  dynamic  range,  these  two  quantities  will  be  inversely  related. 

Formally,  for  a  given  maximum  tolerable  average  probability  of  error  per  signal  level 
and  for  equal  a  priori  signal  level  probabilities  P{v)  =  1/L,  i  =  1,  2,  ...,  L,  the  maximum 
attainable  accuracy  can  be  found  by  solving  for  the  maximum  value  of  L  in  the  equation 

j  i-l  i  L 

7  E  E  /  I  Vy)  dv  +  5^  J  Py(v  I  v^.)  dv  =  P^,  Pyiz^  \  v,.)  =  PyCz^  |  v,. ,  j ) 

where  Fv(v|v,)  is  the  level-conditional  probability  density  function  of  v(t),  and  the  choice  of 
signal  levels  v„  i  =  1,  2, ...,  L,  and  decision  thresholds  z„  /  =  0,  1, ...,  L,  subject  to  the  constraints 

comprise  the  optimal  partitioning 
scheme.  Thus,  we  shall  first  consider  a  multiple-hypothesis  (MH)  testing  approach  to  the 
solution  of  this  problem  [3],  whereby  a  bank  of  discriminant-function  calculators  will  be  used 
to  determine  the  membership  of  each  observation  with  respect  to  the  L  optimally  chosen  decision 
regions.  This  will  lead  us  to  a  Lloyd-Max-type  iterative  algorithm  [2,  4]  for  determining  the 
optimal  locations  of  the  signal  levels  V;  and  decision  thresholds  Z;.  The  maximum  value  of  L  will 
then  be  obtained  as  a  by-product  of  this  procedure. 

Alternatively,  we  can  formulate  the  problem  in  the  context  of  parameter  estimation  theory. 
The  two  fundamental  techniques  here  are  the  maximum  likelihood  (ML)  and  Bayesian  strategies 
[3].  The  former  simply  yields  the  most  likely  value  of  the  parameter  as  the  optimal  estimate, 
which  is  nothing  but  the  location  of  the  maximum  of  the  likelihood  function  p{m^.  In  the  latter 
approach,  meanwhile,  we  ascribe  an  a  priori  distribution  p{ms)  to  the  signal  we  wish  to  estimate, 
which  is  then  transformed  into  the  a  posteriori  distribution 

_  Pfel”«5)P(%) 

\ms)  Pints)  dnis 

via  Bayes’s  rule  upon  obtaining  the  sample  vector  v.  Depending  upon  the  optimality  criterion 
used,  the  parameter  estimate  is  then  given  by  the  mean,  median,  or  the  location  of  the  maximum 
of  p{ms\y)  [3].  In  this  approach,  the  achievable  accuracy  will  be  quantified  by  the  Cramer-Rao 
lower  bound  on  the  variance  of  the  estimate,  which  offers  us  a  tradeoff  opportunity  between 
accuracy  and  speed  [2]. 

The  direct  application  of  these  classical  techniques  is  seriously  hindered  by  the  signal 
dependence  of  the  noise  at  the  processor  output.  An  ingenious  way  to  get  around  this  difficulty 
is  to  use  proper  normalizing  transforms  that  can  potentially  stabilize  high-order  moments,  such 
as  the  variance  and  skew,  of  the  underlying  observations  [5].  For  the  specific  form  of  signal- 
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noise  dependence  exhibited  by  optical  processors,  we  shall  present  the  exact  form  of  the  variance- 
stabilizing  transformation  that  would  help  us  remove  this  dependence  from  our  observations,  thus 
facilitating  the  use  of  the  much  simpler  forms  of  these  techniques  for  the  signal-independent 
noise  case  [2]. 

By  applying  optimally  tailored  detection  and  estimation  schemes  with  the  help  of 
normalizing  transforms  to  our  generic  discrete  analog  optical  processor,  we  shall  show  that  the 
parameter  estimation  techniques  are  superior  to  the  MH  testing  approach  with  respect  to  the 
number  of  bits  of  achievable  accuracy,  especially  if  one  is  willing  to  sacrifice  throughput  for 
accuracy  [2],  Specifically,  in  the  former  approach,  the  number  of  bits  increases  steadily  with  the 
number  of  samples  taken  while  remaining  relatively  constant  in  the  latter  approach,  as  shown  in 
the  figure  below.  However,  the  receiver,  or  classifier,  stracture  for  the  MH  testing  approach  is 
considerably  simpler,  and  hence  makes  it  more  attractive  if  fast  and  low-cost  enhancement 
techniques  are  more  desirable.  The  amount  of  enhancement  potentially  achievable  with  each 
technique  will  be  given  for  practical  device  parameters. 


Numbers  of  bits  of  achievable  accuracy  for  various  enhancement  techniques. 

(Y  -  V  =  10,  P=  10"^,  and  =  10“®) 
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1.  Introduction 

Optical  matrix  processors  were  developed  to  exploit  the  high  degree  of  parallel  connectivity 
inherent  in  free  space  optical  interconnection.  Researchers  have  proposed  and  investigated  optical 
algebra  processors  for  at  least  three  decades^>2.  Recent  advances  in  multi-channel  modulators* 
vertical  cavity  surface  emitting  laser  (VCSEL)  diode  arrays,  light  valve  technology,  and  detectors 
have  potential  to  make  these  systems  practical  for  many  applications.  Specifically,  vector-matrix 
processing  can  give  much  higher  throughput  than  digital  approaches  and  many  applications  exist 
were  performance  can  be  bought  with  speed  even  at  the  price  of  accuracy  or  dynamic  range3>4. 

PSI  is  developing  a  series  of  analog  optical  vector-matrix  processors^.  Tantamount  to  their 
performance  is  andog  optical  signal  accuracy.  This  paper  describes  techniques  to  achieve  real-time 
optical  analog  signal  generation  through  inherently  nonlinear  physical  processes.  Prior  art  relied 
on  external  modulation  of  laser  sources  using  an  8-channel  acousto-optic  Bragg  cell.  Currently, 
PSI  is  developing  a  directly  modulated  VCSEL  source  for  a  64-channel  vector-matrix  processor. 
Analog  optical  signal  accuracy  in  both  types  of  processors  will  be  described. 


2.  Analog  Optical  Matrix- Vector  Processor 

An  8-channel  8  bit  analog  optical  vector-matrix  (AOVM)  processor  system  using  external  source 
modulation  is  shown  in  Fig.  1.  The  optical  section  uses  a  single  laser  source.  The  output  of  a 
precision,  visible-wavelength,  semiconductor  laser  is  replicated  to  eight  separate  beams  to 
illuminate  the  vector  modulator.  An  eight  channel  acousto-optic  modulator  (AOM)  encodes  the 
input  vector  data  on  each  of  the  beams.  The  eight  carrier  beams  are  delivered  to  the  matrix 
modulator  by  the  fan-out  optics.  After  proper  analog  modulation  by  the  matrix  mask,  the  fan-in 
optics  delivers  the  matrix  product  terms  to  a  row  oriented  photodetector  array.  In  the  AOVM 
processor  currently  under  development,  using  direct  vector  modulation  of  a  64-element  VCSEL 
source,  the  single  laser  source,  beam  replication  optics,  and  acousto-optic  modulator  are  replaced 
with  a  VCSEL  array. 

Preprocessing  electronic  channels  use  a  digital  8  bit  to  12  bit  look  up  table  (LUT)  to  map  the  data 
values  to  linear  intensity  steps  through  the  nonlinear  vector  modulator.  The  input  8  bit  value  is 
mapped  in  real  time  to  a  12  bit  value  that  precompensates  for  the  nonlinearity  of  the  AOM  or 
VCSEL  array  transfer  function.  The  LUTs  are  independent  for  each  channel  and  give  the  extended 
dynamic  range  needed  to  invert  the  nonlinear  transfer  function. 

Detection  and  postprocessing  channels  incorporate  a  switched-capacitor  integration  filter  to  reject 
broad  band  noise  in  the  detected  optical  signal.  The  digital  control  logic  circuits  manage  the 
coprocessor  timing  and  data  flow  as  well  as  provide  the  hardware  interface  to  the  host  personal 
computer.  Since  the  processor  can  process  data  at  a  rate  substantially  higher  than  the  bandwidth  of 
the  personal  computer  interface  bus,  buffer  memory  is  provided  on  the  interface  card.  The  on- 
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Fig.  1  Optical  Vector-Matrix  Coprocessor  System  Block  Diagram 

board  memory  allows  the  coprocessor  to  achieve  its  designed  data  rate  of  one  million  calculations 
per  second  in  a  burst  mode  for  up  to  one  millisecond. 


3.  Look-Up  Table  Generation 

The  LUTs  are  generated  by  first  by  scanning  each  of  the  vector  channels  through  aU  (or  a  sampled 
range)  of  its  values.  Each  value  of  the  DAC  is  sampled  by  each  of  the  receiver  channels  as  well  as 
by  many  redundant  measurements  in  time.  In  order  to  conapute  the  LUTs,  these  response 
are  then  inverted  to  find  the  DAC  level  associated  with  a  desired  analog  intensity  projecte 
values,  including  zero.  This  requires  statistical  processing  because  of  the  noise  and  non¬ 
monotonic  behavior  of  the  data. 

We  use  a  histogram  to  bin  all  of  the  responses  into  256  bins  corresponding  to  the  desired  256 
linear  analog  levels  desired.  The  bins  are  calculated  using  the  minimum  and  m^imum  responses 
in  a  given  channel's  DAC  scan  response  array.  Using  the  minimum  and  maximum  value,  each 
response  to  an  input  DAC  value  is  assigned  a  bin  number.  In  the  software  algonthm,  the  bin 
number  replaces  the  response  value  in  memory.  After  assigning  each  DAC  value  a  bin  numbe^  ah 
of  the  DAC  values  for  a  particular  bin  are  examined.  Essentially,  the  mean  DAC  value  (centroid)  is 
computed  for  each  bin.  This  gives  an  estimate  of  the  DAC  value  that  would  produce  a  response  m 
that  bin. 


4.  Application  to  External  Laser  Diode  Modulation 

External  modulation  using  a  Bragg  cell  requires  setting  a  DAC  value  into  an  RF  mixer  using 
tone.  Bragg  modulation  involves  a  sine  squared  transfer  function  and,  at  high  levels,  roll-off  due 
to  amplifier  saturation.  In  order  to  compensate  for  these  nonlinearities,  a  digital  approach  is  i^ed 
to  provide  better  noise  immunity  and  thermal  stability  thm  a  nonlinear  analog  network.  The 
acousto-optic  Bragg  cell  RF  drive  electronics  are  biased  slightly  to  insure  that  the  minimum  RF 
condition  (through  the  mixer  and  preamp)  is  in  the  DAC  range,  perhaps  10  to  40  counts  (least 
significant  bits)  into  the  bottom  of  the  DAC  output.  Values  less  than  the  minimum  RF  level  will 
produce  finite  optical  power  out  of  the  vector  modulator.  These  values  must  be  excluded  from  the 
histogram  centroiding  algorithm  to  get  more  accurate  results  at  the  lowest  analog  levels.  The 
centroid  of  the  minimum  bin  gives  the  estimate  for  this  rninimum  RF  DAC  value.  All  values  lower 
than  this  are  subsequently  excluded  from  further  processing. 

Using  this  approach  we  found  some  bins  in  the  histogram  were  unoccupied.  This  left  im  data  to 
base  an  estimate.  Instead  of  using  interpolation  algorithms  on  the  incomplete  LUT,  a  data 
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smoothing  approach  was  taken  on  the  DAC  response  data.  A  convolution  algorithm  is  employed 
that  uses  unequal  weighting  of  31  terms. 


5.  Application  to  Direct  VCSEL  Array  Modulation 

The  transfer  curve  of  a  VCSEL  device  is  similar  to  that  of  a  conventional  laser  diode,  as  shown  in 
Fig.  2.  The  onset  of  lasing  occurs  at  a  threshold  current  value  and  the  optical  power  is  nonlinearly 
proportional  to  input  current.  Using  the  LUT  approach  described  earlier,  we  have  linearized  the 
output  of  a  1  mW  VCSEL  array  to  within  ±  8|i,W,  as  shown  in  Fig.  3. 


VCSEL  Array  Illumination  varaus  Currant 


Fig.  2.  VCSEL  LI  Curve 


Linsarizad  Output  from  VC  Laser  Elament 
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Fig.  3.  Linearized  VCSEL  Output 


6.  Summary 

Recent  advances  in  optoelectronic  components  make  analog  optical  vector-matrix  processors 
practical  for  many  applications.  The  performance  of  these  processors  is  critically  dependent  upon 
analog  optical  signal  accuracy.  A  look  up  table  approach  is  an  ideal  way  of  achieving  accurate  real¬ 
time  optical  analog  signal  generation. 

This  work  was  performed  under  NASA  Ames  Research  Center  contracts  NAS2-1375  and  NAS2- 
14064.  The  authors  would  like  to  thank  the  contract  technical  officer.  Dr.  Charles  Gary,  for  his 
assistance  and  support  in  this  effort 
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Introduction  i  t,  • 

In  the  course  of  exploring  capabilities  of  optical  computing,  optical  analog  scheme  is 
important  field  to  be  researched  more  actively.  Optical  analog  computog  has  exceUen 
advantages,  i.e..  large  data  capacity,  large  processing  capability,  flexibility  in  data 
representation,  and  so  on.  Various  optical  transforms  such  as  convention^  and 
fractional  Fourier  transforms  and  optical  techniques  Including  vector-matrix  multiplier 
and  matched  filter  are  good  examples  of  the  field. 

However,  inherent  disadvantages  also  exist  in  optical  analog  computing  on 
computing  accuracy,  dynamic  range  of  data,  and  difficulty  of  implementation. 
Although  optical  digital  scheme  could  be  an  effective  solution  for  the  problems, 
sampling  nature  of  the  digital  scheme  reduces  native  capabilities  of  optical  computmg. 
In  addition,  the  position  of  optical  digital  computing  must  be  clarified  from  the  strong 
competitors,  or  electronic  computers. 

In  this  paper,  we  consider  a  method  for  high  accurate  optical  analog  compulmg 
using  the  interval  arithmetic  and  the  fixed  point  theory.  As  an  example  of  ^tical 
implementation,  two-variable  simultaneous  equations  are  studied  on  the  optical  fractal 

synthesizer 

High  Accurate  Computing  Using  Interval  Arithmetic  and  Fixed  Point 

One  effective  way  to  bring  high  accuracy  into  optical  analog  computing  ^  to  utilize 
accumulated  resources  in  computing  science.  Enormous  amount  of  effort  has  been 
made  to  improve  and  to  guarantee  the  accuracy  of  computing  executed  on  digital 
electronic  computers.  Among  them,  the  authors  uses  a  method  based  on  the  mterval 
eirithmetic  and  the  fixed  point^). 

The  interval  arithmetics^  is  a  computational  scheme  in  which  a  numerical  datum 
is  treated  as  an  interval  including  a  set  of  real  number  and  the  four  fundamental  rules 
are  defined  as  operations  on  intervals  to  grasp  rounding  error  strictly,  m  the 
arithmetic,  [a,  b]  means  a  close  interval  of  real  number  {x  |  a  <  x  <  b}  and  bina^ 
operator  on  two  intervals  X=  [a,  b]  cmd  Y  =  [c,  d]  is  defined  as  X  4^  Y  =  *  y  I  ^ 

ye  Y}  Hs  =  {+,-,  •,  /}.  Assigning  the  upper  and  lower  bounds  of  the  possible  range  of 
the  target  numerical  datum  to  those  of  the  interval,  we  can  grasp  the  range  of  the 
numerical  datum  after  any  combination  of  the  operations  H*. 

Although  rounding  error  can  be  grasped  with  the  interval  arithmetic,  it  is 
required  to  reduce  the  size  of  the  interval  itself  to  acquire  an  accurate  result,  ms 
task  is  accomplished  with  a  fixed  point  of  a  contraction  mapping  of  the  intervals.  The 
fixed  point  x*  of  a  mapping  g:  X  X  is  defined  as  g(x*)  =  x*  and  its  existence  in  X  is 
proved  if  g(X)  c  X  where  X  is  the  interval.^) 
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Various  computation  can  be  executed  with  high  accuracy  by  the  above 
techniques.  As  ein  example,  computation  of  simultaneous  linear  equations 

Ax  =  b  [A:  nxn matrix,  b:  known  n-vector)  (1) 

is  considered.  2)  Assume  R  and  xf  are  an  approximate  inverse  matrix  of  A  and  an 
approximate  solution,  respectively.  Then  Eq.  (1)  can  be  rewritten  as  the  fixed  point 
format  as  follows: 


R[b-AxU  +  {I-RA)x*=x*  (2) 

where  I  is  unit  matrix  and  xf  +  x*  provides  the  accurate  solution  of  Eq.  (1).  Refer  to 
the  left-hand  side  of  Eq.  (2)  as  g{x^  and  define  a  mapping  g(20  of  the  intervcd  vector  X 
as  Eq.  (3). 


g(X)  =  R(b-Axt)  +  (f-RA)X  (3) 

where  X  is  n-vector  consisting  of  real  intervals  X{s.  If  g(X)  c  X ,  the  fixed  point  x*  e  X 
exists.  To  converge  the  interval  X,  calculate  X**^  iteratively  with  Eq.  (4). 

X(W  =  g(X>^-i))  n  Xt'f-i),  =X  (4) 

As  increasing  k,  converges  to  the  fixed  point  x*.  Consequently,  a  solution  with 
sufficiently  high  accuracy  can  be  obtained  as  xf  +  x*. 

Optical  Implementation 

The  computational  algorithm  shown  in  the  previous  section  can  be  applied  to  optical 
analog  computing.  Initicd  motivation  of  the  high  accurate  computing  is  to  guarantee 
the  computation  on  electronic  computers,  but  this  theory  is  also  effective  for  optical 
analog  computing.  To  study  capabilities  of  high  accurate  optical  emalog  computing, 
optical  implementation  of  two-variable  simultaneous  Unear  equations  is  considered. 


CRT 


k- 

OAP#1 

A 

\ 

NDamera 

"  N 

OAP  #2 

Frame 

Memory 

}- 
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Our  idea  is  that  the  optical  fractal  synthesizer  is  used  to  achieve  high  accur^e 
computing  with  2-D  pattern  processing.  The  optical  fractal  synthesi^r  consists  of  W 
feedback  path  and  optical  affine  transform  processors  as  shown  m  Fig.  1.  .^though 
the  optical  fractal  synthesizer  is  proposed  to  generate  various  kinds  of  fractal  sh^es 
according  to  the  iterated  function  systems,  its  internal  computation  can  be  converted 
into  the  same  form  of  Eq.  (2)  with  simple  modification.  For  two-variable  case,  the 
intervals  are  represented  with  a  spatial  pattern  effectively.  As  shown  jn  Fig.  2,  a  set  oi 
two  intervals  is  expressed  as  a  rectangle  on  image  plane.  To  cope  with  dynamic  range 
of  the  intervals,  scaling  mechanism  is  prepared,  which  manages  correspondence 
between  the  scale  on  image  and  the  number.  Changing  the  scale,  we  can  represent 
arbitrary  range  of  number. 

Actual  processing  procedure  is  as  follows:  1)  Calculating  Rib-  |  ' 

i?A).  2)  Configuring  one  of  optical  affine  tremsform  processor  according  to  the  result  ol 
(1)  3)  Configuring  the  other  of  optical  affine  transform  processor  to  output  the 
identical  image  of  the  input.  4)  Setting  an  initial  image  containing  the  pattern  encoded 
from  the  interval  X.  5)  Running  the  optical  fractal  synthesizer  until  getting  stable 
image.  6)  If  accuracy  is  not  sufficient,  changing  the  scale  of  the  image  plane  ^d 
repeating  the  steps  1  to  5  until  achieving  sufficient  accuracy.  The  processing  on  the 
optical  fractcd  synthesizer  is  illustrated  in  Fig.  3. 

Several  comments  should  be  added  on  the  above  procedure.  1)  In  ^e  step  1, 
the  results  of  the  calculation  are  2-vector  and  2x2  matrix  for  two  variable  case. 
Referring  to  Eq.  (3),  the  mapping  g(X)  is  identical  to  an  affine  transform  executable  on 
the  optical  fractal  synthesizer.  2)  The  configuring  phases  in  the  steps  2  and  3  hold  the 
key  of  high  accurate  processing.  At  this  stage,  several  number  of  optical  probe  spots 
are  used  to  configure  and  to  adjust  the  optical  setup.  However,  more  sophisticated 
method  is  required  for  fast  configuration.  3)  The  original  operation  executed  on  the 
results  of  individual  affine  tremsform  processors  in  the  optical  fractal  synthesizer  is 
simple  addition,  which  does  not  match  to  Eq.  (4).  Thus,  thresholding  ^  executed  m 
the  frame  memory  during  the  feedback.  When  the  threshold  level  is  set  between  umty 
and  double  of  the  unit  intensity  of  the  image,  logical  AND  operation  can  be  achie^^d. 
4)  Since  the  proposed  method  is  based  on  the  spatial  encoding  of  the  intervals  of  two 
variables,  the  number  of  variables  is  limited  by  two.  To  overcome  this  limitation,  new 
encoding  method  should  be  developed. 

Conclusions 

In  this  paper,  a  new  method  for  high  accurate  optical  analog  computing  has  been 
considered  using  the  interval  arithmetic  and  the  fixed  point  theory.  As  an  example  o 
optical  implementation,  two-variable  simultaneous  equations  have  been  demonstrated 
with  the  optiC£d  fractal  synthesizer.  Although  the  current  implementation  has  many 
limitations,  we  hope  this  paper  would  be  a  trigger  of  new  research  on  high  accurate 
optical  analog  computing. 
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It  was  shown  recently  [1]  that  optics  allows,  in  principle,  up  to  10*®  bits/cm^s  data  transmission 
rates  with  no  fundamental  restrictions  in  the  information  channel.  This  is  due  to  the  optical 
degrees  of  freedom  which  are  of  a  3D-nature.  In  this  paper  we  discuss  how  to  utilise  these  optical 
degrees  of  freedom  with  respect  to  the  well-known  advantages  of  the  digital  approach  to  general 
purpose  information  processing. 


This  raises  the  problem  of  coding  information  in  a  practical  way:  how  to  match  the  language  (in 
which  data  are  represented)  with  the  nature  of  a  particular  information  channel  and  to  approach  a 
rate  of  transmission  of  information  as  high  as  possible  for  a  given  channel.  In  electronics  there  is 
the  only  case  of  coding  information  -  a  sequential  code.  In  such  a  code  each  symbol  is  assumed 
to  be  in  the  form  of  a  pulse  of  a  certain  time  duration  with  a  certain  amplitude.  A  3D  optical 
channel  may  be  considered  as  a  "2D-coordinate  plane  plus  ID-time"  channel.  Hence  in  this  case 
one  may  discuss  the  problem  of  coding  with  respect  to  space  and  time  domains  separately  [2].  As 
to  the  time  domain  there  is  no  principal  difference  as  compared  with  electronics.  RegartUess  to 
the  case  considered,  either  optical  or  electronic,  communication  lines,  circuits,  switching  compo¬ 
nents,  etc.  possess  a  time  constant  Xq  which  limits  the  system  frequency  bandwidth  AVq  in  accord¬ 
ance  with  the  uncertainty  principle  constraints  AVq  =  l/2Xo;  if  energy  E„  is  received  by  a  system. 


it  decays  naturally  according  to  an  exponential  law  E=E„  exp 


This  restricts  the  value  of  both  the  shortest  pulse  duration  and  the  shortest  time  interval  between 
two  sequential  pulses  in  order  for  a  logical  decision  during  a  decoding  process  be  made  correctly 
for  any  of  n  distinguishable  levels  of  signal.  Correspondingly,  a  maximisation  procedure  for  the 
channel  information  capacity  gives  n  =  2  as  the  preferable  code  basis  [3].  Besides  the  phenome¬ 
nological  similarity  in  sequential  coding  for  optical  and  electronic  channels,  there  is  a  significant 
difference  in  the  values  of  the  frequency  bandwidth  for  these  two  cases.  From  the  technological 

point  of  view  it  is  rather  difficult  to  expect  to  be  higher  than  10*°  Hz.  In  contrast,  with 

respect  to  the  optical  cases  utilising  intensity  logic,  AVo^''‘'“*~10*^  is  reasonable  for  many  types  of 
optical  switches,  say  such  as  based  on  phenomena  of  the  soliton  propagation  in  optical  fibres. 

The  problem  of  a  spatial  coding  in  2D-coordinate  space  is  a  more  complex  one.  For  an  encoded 
optical  image  to  be  transmitted  through  an  optical  spatial  channel,  the  smallest  area  allowed  for 
occupation  by  the  spatial  code  symbol  must  be  chosen  in  accordance  with  similar  constraints 
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as  for  the  sequential  time  coding,  i.e.  the  uncertainty  principle  relationship  must  be  satisfied 
between  the  linear  size  Tq  of  the  spatial  code  symbol  and  the  spatial  frequency  bandwidth  of  the 
optical  imaging  system.  The  highest  spatial  frequencies  which  can  be  transmitted  through  an 
optical  system  are  imposed  by  light  propagation  effects  and  set  the  principal  resolution  limit  on 
the  optical  image. 

Based  on  such  restrictions  one  can  select  the  set  of  the  resolution  cells  (pixels)  in  the  (xj)-plane 
and  assign  each  cell  with  an  intensity  of  optical  signal  quantized  into  n  discrete  levels: 

where  j  =  1,2,3,-  •  This  leads  to  the  common  concept  of  pixelation  of  an  optical  image. 
Obviously,  the  smallest  area  of  a  single  pixel  of  image  is  related  to  the  spatial  frequency  band¬ 
width  of  the  channel.  To  calculate  its  area  and  the  most  favourable  code  basis  (n)  for  spatial  cod¬ 
ing,  we  assume  that  the  intensities  of  the  pixelated  beams  in  the  cross-section  of  the  image  are 

given  as 

r  /^rYl 

Ijir)=Ij&xv  -  -  .  (2) 

L  \roJ  J 

For  optical  beams  the  exponential  factor  in  Eq.2  can  vary  from  1  to  many.  For  the  Gaussian 
beam  m  =  2.  However,  in  some  practical  cases  light  scattering  can  result  in  a  situation  where  the 
beam  spot  has  a  diffusive  shape  with  1  <  m  <  2.  In  contrast,  by  a  special  arrangement  it  is  poss¬ 
ible  to  create  a  so-called  super-Gaussian  beam  with  m>2.  Such  a  variety  of  choices  for  the  light 
beam  shape  leads  to  differences  in  the  spatial  domain  coding  as  compared  with  the  ID-time  case. 
For  an  arbitrary  combination  of  symbols  in  a  2D  discrete  image  one  needs  to  avoid  a  wrong 
decoding  process  which  may  occur  if  a  particular  light  pixel  does  not  decay  sufficiently  within 
the  area  of  the  surrounding  pixels.  A  decoding  process  can  be  conducted  correctly  at  least  if  the 

intensity  of  a  pixel  lies  within  window  narrower  than  ±|  away  from  the  corresponding  /^-value. 

The  worst  case  obviously  is  then  when  the  pixel  of  interest  is  of  maximum  /^-intensity  whilst  one 
or  more  of  its  nearest  neighbours  is  of  0-intensity.  Hence  we  have  the  condition 

(n  -  l)/oexp[-(^^j"'  which  results  in 

^>[ln2(n-l)]'""  .  (3) 

^0 

Here  r,  is  the  smallest  distance  allowed  between  the  centres  of  two  neighbouring  pixelated 
beams.  Thus,  for  a  total  area  of  a  optical  image  S  the  number  of  pixelated  beams  is  Urj.  With  an 
assumption  of  equal  a  priori  probabilities  for  the  spatial  code  symbols,  this  gives  the  maximum 
value  of  the  information  entropy  represented  by  a  spatially  encoded  image: 

^0 


with 
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Inn 

[ln2(n  - 1)]^”  ■ 


(5) 


Note  that  for  the  Gaussian  beams  {m  =  2)  the  function  ^(n),  dependent  of  the  code  basis  n,  is  the 
same  as  in  sequential  electronic  coding  [3].  This  means  that  for  an  image  consisting  of  pixelated 
beams  of  Gaussian  shape,  the  maximum  information  is  reached  for  a  binary  encoded  optical 
message  or  for  an  analogue  one.  For  various  values  of  m  the  function  ^(n)  is  shown  in  Fig.l. 
Interestingly,  for  m  =  1  the  function  has  no  minimum  at  all  and  continuously  decreases  to  0 
as  the  integer  n  goes  from  2  to  «<>.  This,  in  particular,  explains  the  fact  why  even  slight  scattering 
effects,  leading  to  m  <  2  in  Eq.2  ,  are  so  crucial  for  analog  image  processing.  It  is  also  seen  from 
the  Fig.l  that  for  super-Gaussian  beams  the  binaiy  code  is  not  an  optimum  one  and,  thus,  larger 
basis  digital  representations  of  data  allow  to  utilise  degrees  of  ff^dom  of  an  optical  channel 
with  a  better  efficiency,  at  least  in  principle. 


Fig.l.  Plots  of  the  code  basis  depending  functions  4(n)  for  ID  sequential  coding  (a) 
and  for  2D  optical  spatial  coding  (6). 

It  must  be  pointed  out,  the  concept  of  pixelation  of  a  light  image  is  not  only  a  problem  of  the 
pixelation  of  an  opticd  intensity  distribution  itself,  which  can  be  made  with  the  resolution  close 
to  the  diffraction  limit.  A  pixelated  image  needs  to  be  processed.  This  implies  that  pixelated 
beams  must  interact  in  some  way  with  an  optical  device,  either  uniform  or  pixelated  itself.  If  a 
non-local  mechanism  of  interaction  takes  place  then  one  needs  to  increase  the  minimum  distance 
between  pixels  in  order  to  avoid  undesirable  cross-talk.  Also,  the  technological  restrictions  could 
not  allow  to  decrease  the  interpixel  distance  up  to  its  possible  principal  limit:  such  a  case  occurs 
with  SEED  arrays  where  a  larger  part  of  a  pixel  area  is  non-transparent  optically  and  is  occupied 
by  electronic  components  of  a  smart  pixel. 
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1.  Introduction 

We  have  proposed  and  studied  the  synthesis  of  the  optical  coherence  function  by 
utilizing  direct  frequency  modulation  of  a  laser  diode[l-4].  We  applied  this  manner  to  2- 
dimensional  (2-D)  or  3-D  optical  information  processing  systems [3,4],  It  has  the  optical 
processing  functions,  such  as  a  slice  extraction  of  an  arbitrary  depth  from  a  3-D 
semitransparent  object.  In  this  information  processing  system,  holography  is  used  to  choose 
the  interference  component  selectively  from  other  components.  Use  of  a  silver-halide  hologram 
has  prevented  real-time  processing  in  our  previous  study [3,4].  In  this  presentation,  we 
demonstrate  real-time  processing  by  using  a  liquid  crystal  spatial  light  modulator  as  a  real-time 
hologram. 

2.  Principle  and  functions  of  the  system 

The  complex  coherence  function  y  is  calculated  as  the  Fourier  transformation  of  the 
power  spectrum  of  the  light  source[5].  By  modulating  the  optical  frequency  with  an  appropriate 
waveform,  the  power  spectrum  is  synthesized  in  the  sense  of  time-averaging,  and  then  the 
optical  coherence  function  can  be  synthesized.  When  the  laser  frequency  is  modulated  with  a 
waveform  shown  in  Fig.  1(a),  the  coherence  function  having  the  shape  shown  in  Fig.  1(b)  can 
be  synthesized[l-4].  The  shape  of  the  degree  of  coherence  I7I  becomes  the  periodic  delta-like 
function  when  the  number  of  the  frequency  pairs  N  is  large  enough.  This  means  that  the  light 
with  the  frequency  modulation  in  the  waveform  shown  in  Fig.  1(a)  can  interfere  only  when  the 
specific  optical  path  length  difference  exists. 

The  information  processing  system  is  shown  in  Fig.  2.  A  laser  beam  is  divided  into  the 
reference  and  the  object  wave  after  beam  expander.  Both  waves  are  incident  on  the  hologram. 
When  the  coherence  function  is  synthesized  to  have  the  delta-function-like  shape  as  described 
above,  for  example,  only  the  reflected  wave  at  the  plane  corresponding  to  the  peak  of  the 
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coherence  function  can  interfere  with  the  reference  wave.  The  modulation  parameters  can  be  set 
so  that  there  exists  only  one  peak  in  the  object.  Thus,  we  have  the  selective  interference  having 
the  information  corresponding  to  only  one  plane.  Holography  is  one  of  the  manners  to  choose 
only  the  interference  component.  Only  the  interference  component  is  reconstructed  as  the 
diffracted  wave  from  the  hologram. 

The  functions  of  the  optical  information  processing  depend  on  the  shape  of  the 
coherence  function.  Selective  extracting  or  masking  of  2-D  information  from  a  3-D  object  can 
be  performed[4]  by  using  delta-function-like  or  notch  shaped  coherence  function,  respectively. 
Also,  the  extracting/masking  position  can  be  changed  by  the  modulation  parameter  of  the  laser 
diode.  It  does  not  require  any  mechanically  moving  part. 

3.  Experiment 

The  selective  extraction  of  2-D  information  from  a  3-D  object  was  experimented.  Two 
mirrors  were  set  in  different  positions  as  a  simplified  3-D  object.  Arm  length  difference 
between  the  reference  and  each  object  mirror  was  set  as  z  =  45,  and  40cm,  respectively.  Each 
object  mirror  has  the  letter,  '9'  or  '5'  on  it  for  identification,  respectively.  The  light  source  is  a 
780nm-wavelength  F-P  type  semiconductor  laser  diode.  Modulation  waveform  was  made  by 
an  arbitrary  waveform  generator.  The  number  of  frequency  pair  N  was  6.  The  frequency 
spacing/sep  was  set  to  be  333  and  375MHz,  corresponding  to  z=45  and  40cm,  respectively. 
The  liquid  crystal  spatial  light  modulator  (HAMAMATSU,  PAL-SLM)  is  used  as  a  real-time 
hologram.  The  spatial  resolution  is  higher  than  501p/mm,  but  much  lower  than  silver-halide.  It 
limits  the  separation  angle  between  the  reference  and  the  object  wave  within  about  2°.  The 
sensitivity  is  optimized  at  X=633nm,  but  the  light  source  chosen  for  this  experiment  is  at 
780nm  because  of  the  coherency  of  the  semiconductor  laser  diode.  At  this  wavelength,  about 
400^W/cm2  is  required  for  k  modulation.  The  reconstructed  images  were  deteriorated  because 
of  diffraction.  To  improve  them,  a  lens  is  set  so  that  the  object  light  gives  the  image  on  the 
liquid  crystal  spatial  light  modulator.  The  reconstruction  waves  are  focused  by  another  lens  and 
filtered  at  the  focal  plane  to  cut  0-th  light  completely.  The  raising  time  of  the  liquid  crystal  is 
30msec,  while  one  period  of  modulating  the  laser  frequency  is  2.4msec  in  this  experiment. 
Therefore,  the  response  time  of  this  system  is  determined  that  of  the  liquid  crystal  spatial  light 
modulator. 

The  results  of  the  experiments  are  shown  in  Fig.  3.  They  are  reconstructed  images  of 
the  real-time  holography.  Figure  3(a)  is  recorded  and  reconstructed  without  the  modulation  to 
synthesize  the  coherence  function,  in  which  both  the  letters  '9'  and  '5'  are  seen.  Figures  3(b) 
and  (c)  show  the  results  with  the  modulation  to  synthesize  the  optical  coherence  function  with 
the  delta-function-like  shape.  In  each  figure,  only  one  letter  '9'  or  '5'  can  be  seen. 
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4.  Conclusion 

Real-time  information  processing  system  by  using  the  synthesis  of  the  optical 
coherence  function  has  been  constructed.  By  synthesizing  the  delta-function-like  coherence 
function,  selective  extraction  of  2-D  information  from  a  3-D  object  was  successfully  earned  out 
in  real  time. 
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Fig.  1  Synthesis  of  the  optical  coherence  function,  (a)  Modulation  waveform  of  the  laser 
frequency,  (b)  synthesized  optical  coherence  function. 
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Fig.  2  The  optical  information  processing 
by  the  synthesis  of  the  optical  coherence 
function  using  real-time  holography. 


Fig.  3  Selective  extracting  of  2-D 
information  from  the  3-D  object: 
Reconstructed  images  of  holography, 
(a)  Without  the  modulation, 

(b),(c)  with  the  modulation  to  synthesize 
the  delta-function  like  coherence  function. 
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Optical  interconnections  for  smart  pixel  arrays  [1]  require  the  use  of  an  imaging  optics  that  can  handle 
large  fields  at  high  resolution.  For  that  purpose,  the  use  of  "hybrid"  imaging  systems  has  been 
proposed  and  demonstrated  [2-4] .  The  basic  setup  of  a  hybrid  imaging  system  that  combines 
conventional  4-f  imaging  with  microoptic  lenslet  arrays  is  shown  in  Figure  1. 


Figure  1:  Space-invariant  hybrid  imaging  setup  (after  [2]).  For  simplicity,  geometrical  optical  paths 

are  shown. 

The  lenslets  in  arrays  Ai  and  A2  have  the  task  to  reduce  the  numerical  aperture  of  the  light  beams 
emitted  from  the  point  sources  in  the  input  array  and  to  provide  tight  focussing  in  the  output  plane, 
respectively.  It  is  therefore  possible  to  use  imaging  lenses  Li  and  L2  with  relatively  large  f-numbers. 
This  results  in  a  significant  reduction  of  the  aberrations  [4].  The  price  to  pay  is  that  the  setup  as  shown 
in  Fig.  1  is  limited  to  space-invariant  imaging.  It  would  be  of  interest,  however,  to  take  advantage  of 
the  properties  of  a  hybrid  imaging  system  for  other  applications,  too.  For  this  purpose  we  consider 
modifications  of  the  basic  setup  by  introducing  additional  degrees  of  freedom  in  the  design  of  the 
lenslet  arrays.  These  are  the  focal  length  f  of  the  lenslets  and  the  deviation  angle  a  of  the  collimated 
beams  [5]  (see  Fig.  2).  We  consider  three  cases  where  we  make  use  of  these  parameters. 

microlens 


grating 

Figure  2:  Combination  of  a  microlens  and  a  deflection  grating. 
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1.  Space- variant  interconnections: 

Figure  3a  shows  a  setup  with  only  one  imaging  lens  L  in  a  2F-  2F  configuration,  where  F  is  the  focal 
length  of  the  imaging  lens  L.  In  order  to  provide  proper  imaging,  the  light  beams  emerging  from  lenslet 
array  Ai  are  focussed  and  deflected  through  the  center  of  L.  L  acts  as  a  field  lens  and  forms  an  image  of 
Ai  in  the  plane  of  A2.  If  its  lateral  diameter  is  considerably  smaller  than  the  aperture  of  the  system, 
one  can  consider  to  place  several  imaging  lenses  in  plane  L  (Fig.  3b).  By  appropriate  design  of  the 
deflection  angles  it  is  now  possible  to  implement  space-variant  interconnections  such  as  the  2-D 
crossover  interconnect  [6]  as  shown  in  Fig.  3.  This  setup  resembles  a  multiple-aperture  implementation  of 
the  perfect  shuffle  [7]. 


in  out 


Figure  3: 

a)  Hybrid  imaging  system  with  one  imaging  lens  L. 
b)  Space-variant  interconnect  with  two  imaging  lenses  in  array  configuration  (LA). 


Figure  4:  Unfolded  off-axis  hybrid  imaging  setup.  Fig.  5.:  Section  of  the  imaging  setup  with  a 

°  microlens  array  with 

variable  focal  lengths 

2.  Design  of  off  axis  or  folded  imaging  setups: 

As  described  above,  in  a  hybrid  imaging  setup  the  focal  power  of  the  system  is  split  between  the 
microlenses  arrays  and  the  imaging  lens  L.  Thus  the  effect  of  aberrations  can  be  reduced.  This  in  turn  is 
important  for  off-axis  or  folded  planar  optical  imaging  systems  as  used  for  the  packaging  of  3D  optical 
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systems.  Nevertheless,  to  achieve  good  image  quality  the  setup  as  well  as  the  lerrses  need  to  be 
designed  properly.  A  planar  optical  version  of  a  2F-2F  imaging  system  as  shown  in  Fig.  3a  was  discussed 
in  ref.  [8].  The  two  main  aberrations  are  astigmatism  and  image  field  curvature  (Fig.4.).  Optical  designs 
which  compensate  for  astigmatism  have  been  demonstrated  for  diffractive  [9]  as  well  as  refractive 
lenses[10].  Here,  we  suggest  to  use  microlens  arrays  with  varying  focal  lengths  to  compensate  for  the 
remaining  field  curvature.  This  is  illustrated  in  Fig.  5.  Each  pixel  of  the  input  array  is  imaged  with  a 
specific  focal  length  which  varies  with  its  position  in  the  input  array. 

3.  Planar  optical  correlator  using  hybrid  imaging  setup: 

The  use  of  planar  optics  was  suggested  for  building  rugged  correlators  [11].  Again,  a  hybrid  imaging 
system  can  be  used  to  get  rid  of  aberrations  as  discussed  above.  However,  this  causes  a  second  problem 
cormected  with  the  basic  hybrid  imaging  setup  (Fig.  1).  Here,  we  have  a  rigid  scheme  where  a  specific 
lenslet  in  array  A2  is  allowed  to  receive  light  only  from  a  single  corresponding  position  in  Aj.  However, 
for  spatial  filtering  it  is  necessary  to  allow  light  from  different  input  positions  to  end  up  at  a  specific 
output  position.  A  simple  solution  is  found  by  omitting  the  second  lenslet  array  A2  (f->oa)  and  using  a 
low-resolution  detector  in  that  plane  instead.  As  the  pitch  in  the  lenslet  arrays  can  be  quite  small  (i.e. 
on  the  order  of  10  pm)  still  reasonably  large  space-bandwidth  products  can  be  obtained. 


Figure  6:  Optical  correlator.  F:  spatial  filter. 
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Fuzzy  logic  ^  has  potential  application  in  fields  such  as  pattern  recognition  and  process  control.  Since  Liu 
first  introduced  an  optical  fuzzy  logic  processor  utilizing  a  lens-array-based  multiple  imaging  system,  2  many  other 
systems  have  also  been  proposed  and  demonstrated.  Most  of  early  implementations  were  based  on  the  principle  of 
shadow-casting,  with  spatially  encoded  patterns  being  superimposed  on  each  other  by  use  of  either  light  source 
array  ^  or  lens-array.  2  To  obtain  correct  output  of  the  fuzzy  logic  maximization  (or  minimization)  operations, 
thresholding  devices  were  needed  in  some  systems.  These  thresholding  devices,  as  well  as  the  complex  encoding 
patterns,  make  the  systems  complicated.  Other  systems  utilized  a  complex  encoding  scheme  which  resulted  in  an 
output  pattern  different  from  the  input  patterns.  Thus,  the  encoding  scheme  proposed  for  two-input  fuzzy  logic 
operations  was  difficult  to  be  extended  to  multiple-input  operations.  3A,  in  this  paper,  we  propose  and  demonstrate  a 
novel  optical  fuzzy  logic  processor  based  on  four-wave  mixing  in  photorefractive  crystals.  Specifically,  the 
recording  of  light-induced  gratings  is  utilized  to  achieve  minimization  operations,  while  the  readout  of  degenerated 
gratings  is  utilized  to  achieve  maximization  operations.  Our  system  has  several  advantages  including  simple  data 
encoding  scheme,  full  parallelism,  high  speed,  high  accuracy,  and  simple  architecture  (no  thresholding  devices). 

To  implement  fuzzy  logic  operations  in  photorefractive  crystals,  the  fuzzy  value  is  encoded  using  a 
'digitized'  transparent  bar  as  shown  in  Fig.  1.  In  this  manner,  the  fuzzy  value  is  represented  by  the  ratio  of  the 
number  of  transparent  holes  to  the  total  number  of  holes.  The  fuzzy  variables  A  and  B  shown  in  Fig.  1  are  equal  to 
0.6  (6/10)  and  0.8  (8/10),  respectively.  Similar  to  binary  logic,  any  fuzzy  function,  / ,  can  be  written  in  disjunction 
normal  form,  according  to  Morgan's  Law,  2  i.e.,  /  =  max|min{A[,Bj,C[, •••},•••, min'[A,,B;,Cj,---},"-|,  or  in  a 
shorthanded  notation,  /  =  max{min{A,B,C,--}},  where  A,  B,  C,  etc,  represent  fuzzy  vectors  with  each  element 

representing  a  fuzzy  variable.  This  max-min  operation  first  gives  a  minimum  for  each  column  of  elements  of  A,  B, 
C,  etc.  Then  a  maximum  is  calculated  among  all  these  minima.  A  simplest  case  is  the  max-min  operation  between 


fuzzy  variable  A 


fuzzy  variable  B 


Fig.  2  Schematic  drawing  of  a  photorefractive  fuzzy  logic  processor 


Fig.  1  Encoded  fuzzy  variables  A  and  B 
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two  fuzzy  vectors.  We  describe  in  what  follows  how  such  a  2~input  fuzzy  logic  controller  can  be  implemented  in 
photorefractive  crystals. 

Fig.  2  shows  the  schematic  diagram  that  describes  the  principle  of  operation  of  the  photorefractive  fuzzy 
logic  controller.  Both  encoded  patterns  of  fuzzy  vectors  A  and  B  are  placed  at  the  front  focal  plane  of  lens  Li.  At  the 
rear  focal  plane  of  lens  Li,  a  photorefractive  crystal  is  placed  as  a  volume  holographic  medium.  The  recorded 
grating  will  then  be  read  out  by  a  set  of  read  beams,  which  consisits  of  a  full  row  of  transparent  holes  located  in  the 
front  focal  plane  of  lens  L2.  Note  that  the  crystal  is  also  located  at  the  rear  focal  plane  of  the  lens  L2  so  that  Bragg 
condition  can  be  matched.  The  output  of  the  max-min  operation,  the  diffracted  beam  set,  is  then  directed  by  a  beam 
splitter  to  the  output  plane  located  at  the  focal  plane  of  the  lens  Lj. 

In  order  to  implement  the  max-min  operations,  both  vectors  are  aligned  in  such  a  way  as  shown  in  Fig.  3. 
The  elements  of  both  vectors  Aj  and  Bi  are  alligned  in  y  direction.  The  fuzzy  values,  |i(Ai)  and  |l(Bi),  are 
represented  by  the  number  of  transparent  holes  (between  0  to  N)  alligned  in  +x  and  -x  direction,  respectively.  In 
other  words,  each  row  represents  an  element  of  the  fuzzy  vector,  and  each  pattern  contains  M  elements  alligned  in  y- 
direction  (vertical).  An  array  of  incoherent  lasers  (MxN)  is  used  to  illuminate  the  two  patterns  so  that  only  a 
maximum  of  MxN  photo-induced  gratings  can  be  formed.  Therefore,  for  each  corresponding  pair  of  fuzzy  elements 
(Aj,  Bj),  the  number  of  gratings  recorded  in  the  photorefractive  crystal  will  be  equal  to  minfAj,  Bj}.  It  is  important 
to  note  that  these  recorded  gratings  are  not  all  independent.  With  the  help  of  normal  surfaces,  it  has  been  pointed  out 
that  those  gratings  recorded  by  hole  pairs  in  the  same  two  rows  in  the  patterns  are  degenerate.  ^  During  the  readout, 
a  row  with  N  open  holes  placed  at  the  front  focal  plane  of  lens  L2  will  be  illuminated.  Light  from  this  row  is 
counterpropagating  with  one  of  the  rows  of  pattern  A.  Due  to  grating  degeneracy,  each  of  the  read  spot  will  Bragg- 
match  M  possible  degenerate  gratings,  while  the  number  of  diffraction  spots  in  the  output  pattern  will  equal  to  the 
maximum  number  of  nondegenerated  gratings  recorded  by  all  elements.  In  this  way,  the  maximization  operation  is 
realized. 


Fig.  3  Arrangement  for  the  two  fuzzy  vectors  during  the  recording 
To  demonstrate  the  principle  of  operation  of  this  photorefractive  fuzzy  logic  controller,  we  implemented  the 
max-min  operations  for  two  fuzzy  vectors  each  with  5  elements.  For  simplicity,  the  fuzzy  logic  value  of  1  was 
represented  by  5  transparent  holes  in  our  experiment,  and  we  used  5  strongest  lines  of  an  Ar-ion  laser  as  the 
incoherent  light  source.  In  this  way,  each  column  in  Pattern  A  was  coherent  with  one  column  in  Pattern  B.  Although 
there  were  many  unwanted  gratings,  e.g.,  grating  recorded  by  one  hole  in  Ai  with  another  hole  in  B2,  all  these 
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unwanted  differaction  spots  would  be  outside  the  output  pattern.  By  using  a  mask,  thses  unwanted  diffraction  spots 
were  completely  filtered.  The  experimental  result  of  the  max-min  operation  for  two  fuzzy  vectors  A  =  [0.4,  1.0,  0.4, 
0.  6,  0.8]  and  B  =  [0.4,  0.6,  0.4,  1.0,  0.8]  is  shown  in  Fig.  4.  The  output  of  the  operation  is  equal  to  0.8,  represented 
by  the  presence  of  4  diffraction  spots  in  the  output  pattern. 

Ai^Bi 

0  Max{Min{A,  B}} 

Fig.  4  Experimental  result  of  the  fuzzy  logic  processor  for  two  fuzzy  vectors  A  and  B 
It  is  worth  noting  that  grating  degeneracy  plays  an  important  role  in  our  proposed  fuzzy  logic  processor.  By 
using  grating  degeneracy,  no  lens  array  nor  optical  fanout  elements  are  needed.  Furthermore,  no  thresholding 
operations  are  needed.  Thus,  the  whole  system  is  all-optical  and  easy  to  implement.  In  addition,  such  a  system  can 
handle  two  very  large  fuzzy  vetors,  even  matrices.  For  a  crystal  of  thickness  L=0.5  cm  and  wavelength  1=0.5  mm, 
the  angular  separetion  between  adjacent  holes  can  be  as  small  as  10''^,  which  means  that  1000  holes  can  be  contained 
within  a  numeric  aperture  of  0.1.  Therefore,  such  a  processor  can  deal  with  vectors  with  number  of  elements  up  to 
1000.  If  accuracy  of  0.01  is  desired  (100  holes  are  needed  in  x  direction),  then  we  can  process  fuzzy  matrices  with 
10x1000  in  parallel.  The  speed  of  this  processorcan  be  estimated  as  follows.  The  number  of  operations  for  max-min 
processor  for  two  1000x10  fuzzy  matrices  is  of  the  order  of  10x1000x1000=10^,  and  the  response  time  for 
photorefractive  crystals  is  about  1  ms.  Hence,  the  speed  of  the  fuzzy  processor  is  about  lO^O  op/sec.  This  speed  can 
be  further  increased  if  crystals  with  faseter  response  time  are  employed  and/or  lower  precision  of  data  encoding  is 

allowed. 

In  conclusion,  we  have  proposed  and  demonstrated  a  novel  optical  fuzzy  logic  processor  by  using  grating 
degeneracy  in  photorefractive  crystals.  Max-min  operations,  which  are  used  to  process  fuzzy  vectors  in  disjunctive 
normal  form,  can  be  easily  realized  in  parallel.  The  fuzzy  processor  has  advantages  such  as  simple  data  encoding 
scheme,  high  accuracy,  and  free  of  thresholding  operations.  This  work  is  supported,  in  part,  by  a  grant  from  the  US 
Air  Force  Office  of  Scientific  Research.  Pochi  Yeh  is  also  a  Principal  Technical  Advisor  at  Rockwell  International 
Science  Center. 
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Optical  neural  network  computing  is  of  great  interest  in  terms  of  massively 
parallel  computing.  In  recent  years,  CCD  cameras,  optoelectronic  smart  pixels  and 
spatial  light  modulators  (SLMs)  with  the  high  spatial  resolution  are  reported[l,2].  In 
some  cases,  however,  the  interface  between  2-D  inputs  and  parallel  neural  computing 
systems  or  between  the  computing  systems  and  output  devices  is  not  parallel  but  serial. 
The  bandwidth  of  the  interface  between  the  I/O  systems  and  the  main  computing  system 
is  limited  and  therefore  this  limits  the  performance  of  the  total  system.  Such  a  problem  is 
sometimes  called  I/O  bottleneck.  An  all-optical  parallel  neural  computing  system  with 
highly  parallel  I/O  capability  has  been  reported[3,4].  The  system  of  the  holographic 
associative  memory,  however,  has  limited  functions  and  performances,  because  of  less 
flexibility  of  optical  systems.  An  alternative  approach  is  to  employ  functional 
optoelectronic  systems  for  wide-bandwidth  input  data,  which  can  compress  the  data  for 
the  neural  computing  system.  In  this  paper,  we  present  network  system  consisting  of  an 
electronic  parallel  interface  or  preprocessor  is  described,  and  a  generic  interface  device 
using  nonlinear  organic  material  for  such  a  system  is  finally  proposed. 

Figure  1  shows  the  concept  of  the  optoelectronic  parallel  input  interface  for  a 
neural  computing  system.  This  system  consists  of  a  microlens  array,  parallel 
optoelectronic  circuits  and  a  parallel  electronic  output  system  to  a  neural  system  or  an 
LED  array  for  further  optical  cascading.  The  optoelectronic  circuits  detect  input  data  and 
make  simple  parallel  processing,  for  example,  local  averaging,  edge  detection  or 
thresholding.  To  make  preprocessing,  the  microlens  array  does  multiple  imaging  of  the 
input  image[5]  or  averages  locally  the  input  image.  The  local  averaging  allows  us  to 
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realize  data  reduction  of  the  input  data,  restrictive  shift-  and  rotation-invariance  and 
noise  reduction. 

An  experimental  system  of  an  interface  and  a  neural  computer  is  shown  in  Fig,  2. 
An  microlens  array  of  10x10  Selfoc  microlenses[6]  and  4x4  PIN  photodiodes  with 
operational  amplifiers  are  combined  to  a  parallel  interface.  The  neural  network  computer 
has  7  neural  chips  with  33Gups  (update  connections  per  second),  which  can  organize  3 
or  4  layer  neural  networks.  An  input  image  of  binary  64x64  pixels  is  locally  averaged  by 
4x4  Selfoc  lenses  and  detected  by  4x4  PIN  photodiodes.  The  date  rate  of  this  system  is 
about  10  |j.s. 

We  have  made  a  demonstration  experiment  using  the  developed  interface  and  the 
neural  computer  to  evaluate  the  ability  of  such  a  parallel  interface.  A  model  we 
implemented  is  a  4-layer  neural  network  consisting  of  16  neurons  for  the  input,  the 
second  and  the  third  layers  and  3  neurons  for  the  output  layer.  Eight  alphabet  characters 
are  learned  and  the  learning  is  completed  after  1 125  iterations.  Figure  3  shows  one  of  the 
association  results  for  shifted  inputs.  Numbers  of  correctly  associated  characters  for 
computer  simulation  and  the  experimental  result  are  plotted  to  amount  of  relative  shift. 
This  experimental  result  shows  that  the  parallel  interface  we  proposed  can  reduce  the 
input  data  for  neural  computing  and  also  can  perform  simple  preprocessing,  such  as  local 
averaging,  which  gives  shift-  and  rotation  invariance  to  the  neural  computing  system. 

In  order  to  integrate  the  parallel  interface  we  proposed  here,  a  generic  SLM  is 
designed  based  on  nonlinear  organic  material  as  shown  in  Fig.  4.  This  device  is 
composed  of  a  microlens  array,  PMMA  based  poled  polymer  film  sandwiched  with 
transparent  electrodes,  a  dielectric  mirror  and  a  photo  sensor  array  with  simple  driving 
and  data  processing  circuits.  The  modulation  speed  is  estimated  to  be  more  than  1  MHz. 

This  reserch  is  supported  in  part  by  GRANT  IN  AID  FOR  SCIENTIFIC 
RESEARCH  form  Japanese  Ministry  of  Education. 
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Fig.  2  Experimental  system. 
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Fig.  3  Association  for  shifted  images  (E:  Experiment,  S;  Simulation). 
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Fig.  4  Nonlinear  organic  material  SLM. 
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Implementation  of  optical  logic  operations  by  micro-optical  cascading  of  an 
array  of  differential  PnpN-thyristor  pairs 

Karl-Heinz  Brenner,  Werner  Eckert,  Edwin  Gobel,  Neil  McArdle,  Jorg  Moisei, 

Christoph  Passon 


Introduction 

The  implementation  of  optical  logic  operations  has  been  studied  widely  and  was  demonstrated 
in  various  types  of  systems  [1,2,3].  A  major  goal  for  implementing  these  types  of  systems  in 
future  applications  is  miniaturization  and  integration,  A  design  for  an  integrated  version  of  an 
optical  symbolic  substitution  system,  which  can  be  implemented  with  existing  micro-optical 
components,  was  presented  recently  [4],  In  this  paper,  as  a  first  step  towards  a  fully  integrated 
system,  we  demonstrate  basic  logic  operations  on  an  array  by  cascading  two  active  devices. 


Active  devices 


The  active  devices  designed  for  the  system  consist  of  an  array  of 
PnpN-photothyristors  [5],  where  two  neighboring  thyristors  (a,a’) 
are  connected  by  a  common  load  resistor  to  a  differential  pair 
(fig.l).  Each  pair  operates  as  a  two  pixel  ‘winner  takes  all’  system, 
hence  only  one  pixel  of  the  pair  emits  light,  when  current  is 
applied  to  the  device.  The  binary  data  consequently  are 
represented  in  dual  rail  code.  Each  thyristor  pair  represents  one  bit 
of  information  by  the  position  of  pixel  that  emits  light.  Dual  rail 
coding  is  advantageous  both  from  the  viewpoint  of  logic 
implementation  (simplification  of  symbolic  substitution  systems)  as 
well  as  for  system  reliability  (reduction  of  the  influence  of 

background  light).  The  device  array  consists  of  8x8  differential  ^ 

pairs,  logically  divided  into  four  subarrays  of  size  4x4.  The  size  of  each  pixel  is  30  x  30  pm  . 


n 


Fig.  1 :  Differential  pair  of 
optical  thyristors 


Design  of  the  optical  system 


The  width  of  the  full  input  array  is  approx.  800  pm.  The  microlenses  used  have  a  diameter  of 
250  pm  and  the  numeric  aperture  is  0.1.  The  optical  system  consists  of  two  imaging  stages 
(fig.  2).  The  first  stage  images  the  two  individual  subarrays  A  and  B  to  a  filter  plane  F.  Each 
data  plane  is  represented  by  one  4x4  subarray  of  differential  pairs  in  the  active  device.  The 
second  stage  images  the  filterplane  onto  the  second  active  device.  The  microprisms  attached  to 
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the  microlens  substrate  are 
fabricated  by  thermal  molding 
and  casting  [6]  and  perform 
the  shifts  of  the  copies  of  the 
data-planes,  needed  for  the 
logical  operations.  Field  lenses 
in  the  filterplane  are  included 
to  reduce  loss  of  light  by 
imaging  the  apertures  of  the 
imaging  systems  onto  each 
other. 

For  testing  purposes  an  LCD 
display  is  used  for  data  input. 
The  data  are  coupled  into  the 
path  of  one  microlens  via  a 
beamsplitter.  With  a  second 
beamsplitter  the  output  result 
is  observed  by  a  CCD  camera. 


Optical  logic  operations 


Fig.  3;  Experimental  setup  with  input  and  observation  optics 


Almost  all  techniques  for  implementing  logical  operations  require  the  generation  of  multiple 
copies  of  data-planes  and  the  superposition  of  these  copies  on  a  thresholding  device.  Optical 
Array  Logic,  Image  Logic  Algebra,  Mathematical  Morphology  and  Symbolic  Substitution  are 
all  based  on  this  kind  operation  and  can  thus  be  implemented  with  the  demonstrated  system 
design.  The  logical  operations  demonstrated  in  this  paper  show  the  feasibility  of  combining 
microoptical  systems  with  active  devices. 

Basic  logic  operations 

In  our  first  experiment  two  individual  data-planes  of  the  same 
dimensions  are  taken  as  the  input  and  are  exactly  superposed,  so  that 
each  bit  of  data-plane  A  overlaps  with  the  identical  bit  of  data-plane 


B.  The  result  of  this  superposition  can  be  taken  from  the  tables.  Table 
1  and  2  show  the  relative  intensities  on  each  of  the  dual  rail  pixels'  r 
and  r’. 

In  the  case  Aij=NOT(Bij)  the  intensity  on  both  pixels  is  equal. 


Table  1;  Intensities  of 
the  overlap  result 
pixel  r 


resulting  in  an  undefined  state  of  the  dual  rail  pair.  This  implies  the 
introduction  of  a  bias  light  onto  one  of  the  pixels.  The  choice  of 
position  defines  the  final  logic  operation.  In  the  case  the  bias  light  is 
set  on  pixel  r  the  logic  operation  is  Rij  =  Ay  OR  By.  With  the  bias 
light  on  r’  the  logic  operation  is  Rjj  =  Aij  AND  By. 

To  implement  a  NAND  or  a  NOR  operation,  a  NOT  operation  has 
to  be  performed  with  respect  to  the  AND  or  OR  results.  This  NOT 


operation  can  be  realized  by  generating  two  copies  of  the  resulting 
data-plane.  Each  copy  has  to  be  filtered  in  an  intermediate  image 
plane.  In  one  copy  the  left  pixels'  r  and  in  the  other  copy  the  right 
pixels'  r’  of  every  dual  rail  bit  have  to  be  filtered  out.  These  filtered 


Table  2:  Intensities  of 
the  overlap  result 
pixel  r’ 
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images  are  then  overlaid  with  a  relative  shift  of  two  pixels,  so  that  the  former  left  pixel  r  is 
now  positioned  on  the  right  pixel  r’  and  vice  versa. 


Neighborhood  operations 

In  contrast  to  the  basic  point-to-point  operations,  described  before.  Symbolic  Substitution  and 
image  processing  require  a  shift  of  the  data  plane  copies  in  the  optical  implementation.  The 
amounts  of  shift  and  the  number  of  copies  to  be  overlaid  are  determined  by  the  specific 
symbolic  substitution  rule/image  processing  operation. 


a;  a,’ 


The  optical  setup  for  neighborhood  operations  differs  from  the  first 
system  in  principle  only  by  the  angles  of  the  prisms.  These  angles  are  now 
defined  in  the  first  imaging  stage  to  perform  multiple  copies  of  the  data 
plane  A  and  in  the  second  imaging  stage  to  perform  the  desired  shifts  on 
the  result  plane  R.  Here  we  implement  a  neighborhood  operation,  where 
four  pixels  are  shifted  onto  their  common  neighbor  as  described  in  fig.  5 


Fig  5:  Nearest 
neighbor 
operation 


Conclusion 

The  microoptical  system  demonstrated  here  is  capable  of  performing  the  basic  operations  of 
copying,  shifting  and  overlapping  of  data-planes.  It  demonstrates  the  cascadability  of  PnpN- 
thyristor  arrays  using  microoptical  components. 

The  demonstration  system  is  build  as  a  hybrid  optical  system  (micro-optical  and  standard 
components)  to  input  data  with  an  LCD  display  and  to  observe  the  output  with  a  CCD  camera. 
Detailed  experimental  results  will  be  given. 
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Laterally  inhibitive  connections  form  a  basic  component  of  many  neural  network  algorithms. 
This  paper  describes  a  self-linearised  inhibitory  test  system  (SLITS)  to  demonstrate  basic  image 
manipulation  using  arrays  of  quantum  well  (QW)  modulators.  Inhibition  between  neighbouring 
nodes  is  utilised  to  perform  edge  contrast  enhancement  [1].  System  interconnections  are  both 
optical  and  electrical  with  non-local  interconnections  being  made  optically  using  diffractive 
elements  and  a  one-to-one  electronic  connection  providing  the  inhibitory  response. 

System  Background 

The  objective  of  the  SLITS  is  to  modify  a  1-D  input  pattern  to  increase  contrast  in  areas 
of  rapidly  changing  intensity.  Consider,  for  example,  a  background  illumination  with  a  bright 
central  region  (fig.l)  falling  onto  a  group  of  locally  connected  cells.  If  neighbouring  cells  inhibit 
one  anothers  output  in  proportion  to  the  incident  signal  then  areas  of  uniform  signal  show  a  low 
output.  At  the  rapidly  changing  areas,  cells  next  to  the  bright  region  will  be  more  inhibited  while 
those  next  to  the  dark  region  are  less  inhibited  than  their  neighbours  thereby  improving  contrast. 
Thus  the  SLITS  performs  a  simple  image  pre-processing  stage  analagous  to  the  retina.  Details  of 
the  SLITS  construction  and  experimental  results  are  presented. 

Device  Arrays 

The  devices  used  are  asymmetric  Fabry-Perot  modulators  (AFPMs)  which  differ  from 
normal  SEED  devices  only  in  the  respect  of  having  a  relatively  high  front  surface  reflectivity 
(30%  in  this  case).  This  means  that  the  active  multi-quantum  well  region  is  situated  in  an  optical 
cavity  and  when  the  absorption  is  changed  with  a  0-9V  bias,  a  large  reflectance  change  (>60%) 
with  enhanced  modulation  ratio  (>15)  results.  As  the  structure  is  that  of  a  pin  diode  the  devices 
also  operate  as  efficient  photodetectors. 

Two  linear  arrays  of  AFPMs  are  used  [2]  to  act  as  modulators  and  as  detectors 
respectively.  Each  array  consists  of  21  rectangular  devices  which  measure  80|im  x  2.5mm  each 
with  a  100|Lim  pitch. 

Self-Linearisation  for  Inhibition 

A  means  of  providing  an  inhibitory  signal  between  a  photodetector  and  a  modulator  is 
provided  by  a  negative  feedback  effect  observed  in  QW  pin  diodes  called  self-linearisation  [3,4]. 
A  current  source  (such  as  a  photodiode)  placed  in  series  with  the  QW  modulator  can  be  used  to 
control  its  reflectivity.  The  transfer  curve  (fig.2)  shows  that  for  increasing  control  current, 
provided  by  the  detector,  the  modulator  reflectivity  falls  linearly.  This  is  the  basis  of  the 
inhibitory  signal.  Since  the  modulator  current  must  equal  the  control  photocurrent,  detected 
signal  amplification  can  be  provided  using  current  mirrors.  The  electrical  connection  is  therefore 
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a  detector  and  current  mirror  acting  as  a  current  sink  in  series  with  a  modulator.  This  simple 
circuit  can  be  engineered  to  alter  the  slope  and  shape  of  the  transfer  curve. 

Optical  System  j  •  ■ 

The  connectivity  of  the  system  is  depicted  in  Fig.3.  An  array  of  modulator  devices  is 
illuminated  with  an  input  pattern  and  the  reflected  signal  from  each  individual  device  is  divided 
between  three  nearest  neighbouring  cells  on  each  side.  The  output  of  each  detector  is  fed  back 
electrically  to  its  paired  modulator.  The  modulator  reflectivity  is  determined  by  its  neighbounng 

cells  and  not  by  the  input.  _ 

A  schematic  and  a  photograph  of  the  SLITS  setup  are  shown  in  figures  4  and  5 
respectively.  Lenses  used  are  four  42mm  focal  length  triplet  lenses  which  act  as  Fourier 
transform  lenses  (LI,  L4)  and  for  imaging  (L2,  L3).  The  optics  are  mounted  on  a  steel  slotted 
plate  for  stability  and  ease  of  alignment.  The  laser  diode  source  and  device  arrays  are  mounted 
off-plate. 

Design  and  Fabrication  of  Fan-Out  and  Interconnect  Elements 

In  order  to  generate  the  1x21  input  array  of  beams,  a  16-leyel  kinoform  with  rectangular 
cell  structure  is  used  which  was  designed  using  a  simulated  annealing  algorithm.  The  period  of 
the  input  element  (Kl)  is  ~510|im  and  has  a  theoretical  efficiency  of  96%  and  an  array  non¬ 
uniformity  of  0.19%. 

The  interconnect  element  (K2)  similarly  is  a  16-level  kinoform  structure.  Here  a  grating 
design  (period  ~720mm)  was  used  which  generates  6  ON  beams  (i.e.  ±1,  ±2,  and  ±3  orders) 
embedded  within  a  1x39  order  signal  window  with  OFF  orders  0,  ±4,  ±5,  ....±19  suppressed  to 
<1.6%  of  the  ON  beams.  This  prevents  self-inhibition  and  keeps  the  number  of  nearest  neighbour 
connected  cells  to  three.  The  diffraction  efficiency  of  the  ON  beams  in  this  case  is  81%  with  a 

non-uniformity  of  0.16%.  ,  ,  ,  u  j 

Both  of  these  elements  are  fabricated  in  fused  silica  using  standard  electron-beam  and 
photo-lithographic  techniques  followed  by  reactive-ion  etching. 

Optical  Interconnects 

Diffractiveelement  Kl  and  lens  LI  generate  a  uniform  linear  input  array  of  20  spots.  A 
mask  positioned  at  the  LI  Fourier  plane  selects  the  input  pattern  and  this  is  imaged  via  lens  L3 
onto  the  modulator  devices.  A  second  grating,  K2,  is  used  to  provide  the  required  fanout  shown 
in  figure  3.  The  light  reflected  from  each  modulator  device  is  split  equally  between  its  3  nearest 
neighbouring  cell  detectors  on  each  side  but  not  onto  itself. 

To  avoid  interference  effects  when  coherent  beams  fall  onto  the  same  detector  [5]  the  full 
length  of  each  device  has  been  used.  The  spots  are  input  along  the  diagonal  of  the  modulator 
array  so  that  when  they  are  fanned  out  each  falls  onto  a  different  portion  of  the  detecting  devices. 
Note  that  since  both  sets  of  devices  have  the  same  orientation  the  spacing  of  spots  onto  the 
modulators  is  y2  times  that  onto  the  detectors. 

Signals  from  6  adjacent  cells  are  summed  optically  onto  each  detector  and  this  in  turn 
reduces  the  reflectivity  of  its  paired  modulator  by  a  proportional  amount  according  to  the  self¬ 
linearisation  mechanism.  Once  the  system  stabilises,  on  a  timescale  determined  by  the  electrical 
response,  the  final  solution  may  be  read  from  the  modulator  array.  A  more  complex  system 
where  detectors  and  modulators  are  integrated  together  in  a  monolithic  2D  array  can  be 
envisioned  for  fully  parallel  processing. 
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Fig.  1.  Contrast  Enhancement : 

(a)  input  pattern  intensity, 

(b)  connectivity  -  3  nearest  neighbours 
inhibited  uniformly, 

(c)  resulting  output  pattern. 


Fig.  2.  Electrical  transfer  curve  using 
self-linearisation. 
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Figure  3.  Interconnection 
schematic  showing  how  one  input 
beam  is  distributed.  This  is  repeated 
for  each  device. 
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Figure  4.  SLITS  baseplate  and  components. 


Figure  5.  Photograph  of  SLITS  assembly. 
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Recent  developments  in  smart  pixel  device  fabrication  has  enabled  researchers  to  design  and 
develop  optoelectronic  systems  that  utilize  the  parallelism  and  connectivity  of  optics  with  electronic 
control  and  processing.  It  is  necessary  for  users  of  these  devices  to  have  the  capability  of  testing  the 
components  at  various  stages  of  the  development.  In  particular,  the  AT&T/ARPA  CO-OP  FET-SEED 
platform  has  enabled  groups  in  the  community  such  as  ourselves  to  work  on  our  own  smart  pixel 
device  designs  in  a  co-operative  workshop  [1,  2].  We  have  developed  and  are  using  a  custom  optical 
and  electronic  probe  station  for  the  testing  of  smart  pixel  devices.  The  test  station  allows  us  to  input 
and  extract  optical  and  electronic  signals  from  the  various  parts  of  the  smart  pixels  in  order  to 
characterize  their  behavior  and  performance.  This  feedback  is  essential  for  device  and  system 

development. 

A  specific  objective  in  the  development  of  the  test  station  was  the  flexibility  of  use,  to  allow  full 
testing  of  the  devices.  In  addition,  the  arrangement  is  stable  and  inexpensive.  The  optical  part  of  the 
test  station  includes  provision  for  optical  beams  from  up  to  six  CW  and/or  pulsed  diode  lasers  that  can 
be  introduced  to  the  optical  windows  as  focal  spots.  The  electrical  inputs  and  outputs  are  transferred 
via  electrical  probes  for  low  speed  unpackaged  chip  testing,  or  fixed  connections  made  within  a  high 
speed  chip  package.  The  optical  system  for  generating  the  focal  spots  is  mounted  using  the  semi- 
kinematic  slotted  base-plate  approach  originally  developed  at  AT&T  [3].  Orthogonal  slots  define  the 
mechanical  and  optical  axes.  Beamsplitter  cubes  mounted  at  the  intersections  allow  beam  splitting 
and  recombining  of  the  laser  outputs,  and  extra  slots  provide  for  the  routing  to  output  detectors.  This 
arrangement  provides  the  facility  to  include  a  number  of  independently  controllable  optical  beams 
using  the  orientation  of  polarizers  and  analyzers,  or  retardation  plates  to  vary  the  beam  intensities 
reaching  the  device  plane.  The  plate  is  mounted  as  a  platform  above  the  chip  which  is  mounted  on  an 
x-y  translation  stage.  The  optical  probes  are  routed  via  a  long  working  distance,  0.3  NA,  objective 
lens  down  onto  the  device  windows.  A  photograph  of  the  system  shown  in  figure  1  is  represented 
schematically  in  figure  2.  The  optical  outputs  from  the  devices  are  routed  back  onto  the  plane  of  the 
test  plate  and  split  between  detectors.  The  LED  illuminated  device  is  imaged  onto  a  CCD  camera 
using  a  zoom  lens  and  observed  on  a  monitor.  The  magnitudes  of  the  optical  outputs  are  monitored 
with  either  a  high  gain,  low  noise,  amplified  silicon  detector,  or  a  DC  coupled  avalanche  Si  detector 
used  with  an  AC  coupled  amplifier  for  high  speed  measurements. 
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The  test  station  allows  us  to  investigate  and  characterize  our  smart  pixel  chips  that  were  fabricated  as 
part  of  the  ARPA  CO-OP  FET-SEED  workshop.  The  testing  has  been  carried  out  for  various  devices 
on  the  chip.  Test  structures  were  used  in  the  design  to  allow  investigation  of  the  basic  electronic  and 
optical  properties.  These  included  simple  MQW  optical  modulators  and  FET  structures.  Figure  3 
shows  the  digitized  image  of  part  of  the  chip  as  seen  on  the  monitor,  showing  two  focal  spots  incident 
optical  windows  on  a  FET-SEED  transmitter.  The  electrical  response  of  this  device  is  shown  for 
optical  inputs  of  7|J,W  and  41|j,W  respectively.  This  clearly  shows  the  effect  of  the  optical  signal  power 
levels  on  the  rise  and  fall  times  (tr=85iJ.s,  tf=102|xs  and  tr=12}is,  tf=19|4,s  respectively).  Further 

testing  of  other  devices  has  been  carried  out,  in  addition  to  testing  of  the  eight-bit  transmitter/receiver 
circuits  that  will  be  used  in  a  system  demonstrator  implementation.  In  summary,  we  will  present  the 
issues  and  design  of  the  probe  station  and  the  characterization  of  the  devices  that  are  to  be 
implemented  in  our  next  system  demonstration. 

This  research  is  supported  by  the  Advanced  Research  Projects  Agency  of  the  Department  of  Defense  and  was  monitored  by 
the  Air  Force  Office  of  Scientific  Research  under  Contract  No.  F49620-92-C-0050.  The  United  States  Government  is 
authorized  to  reproduce  and  distribute  reprints  for  governmental  purposes  notwithstanding  any  copyright  notation  hereon. 
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Figure  1.  Photograph  of  the  smart  pixel  test  station 
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Figure  2.  Schematic  of  optoelectronic  test  system. 
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Figure  3.  Electrical  response  of  FET-SEED  transmitter 


Figure  4.  Photograph  of  FET-SEED  transmitter  with  focal  spots  incident  on  two  windows 

(highlighted  in  white  rectangle) 
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Introduction 

Many  early  vision  and  image  processing  algorithms  are  characterised  by  relatively  simple 
processing,  dependent  on  a  weighted  sum  of  pixels  within  a  specified  surrounding 
neighbourhood.  Many  algorithms,  as  shown  schematically  in  fig.  la  involve  a  summation  over 
some  neighbourhood  of  radius  r  where  the  output  depends  only  on  the  inputs.  Serial  processing 
of  such  algorithms  can  be  efficiently  implemented  using  pipelined  architectures.  However,  many 
interesting  algorithms  are  of  the  type  illustrated  in  fig.  lb,  where 

+r 

yi  =  yi(Ii>X  yi-r  ^ 

-r 

involves  a  dependence  on  the  input  li  and  the  output  of  neighbouring  pixels  yi-r.  Typical 
applications  might  involve  some  type  of  continuity  condition  over  neighbouring  outputs.  This 
type  of  recurrent  algorithm  is  far  more  difficult  to  process,  requiring  repeated  iteration  to 
convergence  in  serial  processing,  and  thus  lends  itself  naturally  to  parallel  processing.  Many 
cellular  processing  systems  involving  nearest  neighbour  electrical  intraconnection  have  been 
studied.  However,  extending  electrical  connectivity  beyond  the  nearest  neighbour  presents  great 
difficulties  in  fabrication  and  device  area  use.  Optical  implementation  of  the  fixed,  dense, 
recurrent  connectivity  required  seems  very  attractive.  We  would  like  to  explore  some  of  the 
possible  architectures  and  practical  limitations  of  such  optical  intraconnection. 

The  relative  simplicity  of  the  individual  algorithms  implies  that  any  image  processing 
system  for  control  or  decision  making  will  generally  involve  a  cascade  of  functions  in  a  modular 
hierarchy.  If  the  advantages  of  parallel  processing  are  to  be  exploited,  then  parallel 
interconnection  between  processing  stages  must  be  maintained.  Again,  optics  presents  the  best 
possibility.  Cascaded  operation  must  also  include  some  means  for  optical  signal  regeneration. 
Thus  our  generic  image  processing  array  should  support  optical  intraconnection  between 
neighbouring  pixels  of  an  array,  and  cascaded  optical  interconnection  between  arrays. 

In  the  following  sections  we  consider  a  number  of  possible  optical  implementations.  A 
single  plane  compact  geometry,  where  the  cascaded  interconnections  have  been  designed  for 
easy  alignment,  is  presented  in  detail.  This  and  other  systems  are  examined  to  discover 
limitations  to  intraconnection  imposed  by  physical  optics  and  power  budgets. 

Physical  optics 

There  are  two  general  possibilities  for  implementation.  The  first  method  is  most  akin  to 
conventional  optical  systems,  where  there  is  a  sequential  stack  of  a  light  source/modulator  plane 
with  some  local  electronic  processing,  some  means  for  optical  fan-out  interconnection,  and  a 
detection  plane,  as  depicted  in  fig.  2.  The  nature  of  the  recurrent  calculation  requires  that  there  be 
a  one-to-one  connection,  either  optical  or  electrical,  back  to  the  source  plane,  where  the  node 
value  can  be  (opto)electronically  evaluated.  Ideally,  for  simplicity,  the  source  and  detector  planes 
should  be  back-to-back,  however,  this  is  difficult  and  leads  to  very  long  optical  interconnect 
paths.  Some  method  of  output,  sufficient  to  act  as  input  to  the  next  stage,  must  also  be 
incorporated.  A  number  of  different  methods  of  optical  fan-out  can  be  envisioned,  such  as 
shadow  casting,!  aperture  division  correlation,^  or  holographic.^  The  performance  of  individual 
systems  can  be  maximised  by  combinations  of  bulk,  micro-,  hybrid  or  diffractive  optics. ^ 
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The  alternative  approach  is  to  combine  the  light  source/modulator  devices  directly  with 
the  detectors  in  a  single  planar  smart  pixel  processing  array.  A  reflective  scheme  using  patterned 
mirrors  has  been  suggested.^  Another  planar  solution  using  modulators  is  illustrated  in  fig.  3, 
where  cascaded  interconnection  between  arrays  is  provided  by  hybrid  bulk/micro-optics  and 
local  intraconnection  is  implemented  with  micro  and  diffractive  optics.  The  optical  systein  is 
designed  to  be  implemented  in  two  solid  blocks,  with  the  bulk  lenses  and  the  prism  arrays  being 
permanently  combined  in  one  monolithic  block,  and  the  micro-optics  and  device  arrays  forming 
a  second  solid  block.  The  micro-optics  and  device  array  elements  will  have  to  be  integrated  with 
micron  accuracy,  but  experiments  with  planar  optical  systems  suggest  this  can  be  attained.^  This 
system  has  the  advantage  of  being  insensitive  to  small  lateral  displacements  of  the  two  bulk  units 
in  one  direction,  making  for  easier  alignment,  since  the  prism/lens  combination  conserves  any 
small  displacements  of  the  solid  block  of  arrays. 

Taking  reasonable  numbers  for  optical  and  device  performance  and  integration,  the 
planar  modulator/detector  approach  can  be  compared  with  the  more  conventional  stacked 
approach.  Table  1  summarises  some  estimates  for  space-bandwidth  product  (SBW)  for  a 
number  of  implementations.  The  extrapolated  array  size  for  a  given  intraconnection  for  some 
bulk  stacked  and  compact  planar  systems  is  also  listed.  A  general  point  to  note  is  that  when  the 
stacked  systems  are  implemented  in  a  compact  planar  geometry,  the  array  size  is  severely 
limited,  except  in  the  case  of  the  design  presented  here. 

Power  budgets  ^  •  , 

In  implementing  neighbourhood  intraconnection,  we  niust  consider  the  degree  ot  optical 
fan-out  and  the  consequent  power  requirements.  For  convenience  we  have  considered  square 
pixels,  (although  hexagonal  pixels  have  certain  advantages).  Nearest  neighbour  interconnection 

involves  fan-out  to  a  3x3  array,  but  neighbourhood  expands  as  F8r.  5*^^  nearest  neighbours 
already  involves  fan-out  to  an  (llxll)-l=  120  pixel  array.  This  level  of  fan-out  and  more  has 
been  demonstrated  with  array  generators,^  but  providing  the  optical  power  is  the  difficulty.  As 
always,  a  trade-off  between  power  and  speed  will  be  observed;  the  more  power  to  the  detector, 
the  faster  the  response  of  any  intervening  circuitry  and,  ultimately,  the  modulator.^  Assuming  a 
minimum  power  of  ~l|i.W  at  a  detector,  summed  from  100  pixels,  suggests  a  minimum  of  10 
nW  per  individual  channel.  From  table  1  a  typical  SBW  of  \Q^-\Qp Icvcfi  can  be  expected. 
Allowing  for  losses,  this  suggests  a  total  optical  energy  of  0(lW/cm2),  which  is  close  to  the 
limits  of  what  can  be  dissipated  in  heat  without  resort  to  complex  cooling  mechanisms.^  This 
would  tend  to  suggest  a  limit  on  fan-out  of  around  10x10,  all  else  being  equal. 

Conclusions 

Simple  considerations  of  optical  power  and  practical  implementation  issues  suggest  limits 
to  fan-out  of  -10x10  for  realistic  array  sizes,  larger  fanout  leading  to  smaller  arrays.  Whether 
this  is  a  sufficient  advantage  over  multi-level  electrical  intraconnect  now  being  developed  is  an 
open  question,  although  the  advantages  of  optical  interconnect  seem  clear. 
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Fig.  1  Types  of  lateral  intraconnection 
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Fig.  3  Planar  implementation  of  lateral  intraconnection  and  cascaded  interconnection 


Table  1  SWB  of  lateral  intraconnection  using  various  optical  systems 


with  oattern  Mirror(MACRO) 


JO  I  with  pattern  Mirror(HYBRID) 


SBW(N2xM2  ) 

tan  0  max 

SBW 

N  pixels  for 

7x7  fan-out. 

(Lo>^tan0max/ 5X)^ 

0.4 

1.0  xIO® 

145 

(LoXtan6max/ 

0.4 

2.5  xIO^ 

715 

(LoxLf/2A.f)^ 

6.25x10® 

360 

(LoXtanSmax/ 120  A,)^ 

0.4 

1681 

6 

(  LqX  tan0  max  / 24  X)^ 

0.4 

4.3x10"^ 

30 

(LoXtan0max/ 

1.0 

2.25x10® 

215 

SBW  =  Space  Bandwidth  Product,  Array  size  (Lo)  =  10mm,  X  =  800nm  tanOmax  «  NA  of  Macro  lens  or  Anale  of  HOE 
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1«  Introduction 

The  development  of  high  capacity  parallel  optical 
memories  has  opened  up  the  possibility  for  very  high 
data  transfer  rates  from  secondaiy  storage  devices 
[1].  Current  electronic  interfaces  may  not  have 
sufficient  bandwidth  to  utilize  these  high  data  rates, 
and  thus  a  bottleneck  can  be  created  between  the 
secondary  storage  and  main  memory.  A  primary  use 
of  these  large  capacity  memories  is  storage  of 
records  in  a  database  environment,  where  the 
majority  of  transactions  comprise  searches  for  data 
that  match  a  given  search  argument  [2].  Electronic 
database  processors  have  made  good  use  of 
preprocessing  units  to  filter  the  data  being  transferred 
to  main  memory  in  an  effort  to  reduce  the  data  flow 
rates  to  usable  levels.  A  similar  filtering  unit  that  can 
operate  on  the  parallel  optical  data  output  from  an 
optical  memory  would  provide  even  greater  benefits. 
We  have  presented  in  an  earUer  paper  a  filtering  umt 
consisting  of  cascaded  arrays  of  optoelectronic  logic 
elements  [3].  The  major  components  in  this  filtering 
unit  are  optoelectronic  XOR  and  AND  gates  arrays. 
Previously  we  reported  the  demonstration  of  the 
XOR  array  using  standard  table-top  optics  [4].  In 
this  paper  we  present  a  complete  optical  system 
redesign  using  a  slotted  plate  platform,  developed 
elsewhere  [5],  The  system  uses  AND  and  XOR 
arrays  m  the  demonstration  of  the  projection  and 
selection  modules  of  the  database  filter. 


2.  Platform  Motivation 

The  purpose  of  our  initial  tests  using  the  optical  table 
was  to  identify  potential  obstacles  to  system 
implementation  using  that  platform.  This  system 
required  ahgnment  of  multiple  nonsequential  optical 


paths  which  terminate  onto  160-/im  diameter  input 
windows  with  250-/im  pitch  and  tolerances  of 
approximately  10-/Am.  The  setup  and  ahgnment  of 
just  two  optical  paths  onto  one  3x3  XOR  array 
required  days  to  accomphsh.  Once  ahgned  we  found 
that  drift  of  the  components  necessitated  reahgnment 
every  few  hours.  Considering  these  imtial  difficulties 
we  determined  that  an  alternate  optical  platform  was 
needed.  The  platform  needs  to  be  stable  for  days 
except  for  periodical  minor  readjustments.  This 
allows  the  projection  and  selection  modules  to  be 
assembled  and  optimized  separately  and  then 
interfaced  together.  It  is  preferred  to  be  able  to  ahgn 
multiple  paths  in  a  few  hours  rather  than  days. 
Therefore  the  coarse  ahgnment  needs  to  be  much  less 
time  consuming  than  that  provided  by  an  optical 
table.  Also  the  fine  adjustment  must  be  relatively 
simple  and  quick  in  order  to  speedup  the  overall 
system  ahgnment  and  rapidly  correct  for  component 
drift. 


PDA 
Cylindrical  Lens 
1/4  Waveplate 
Risky  Prisms  v/77/7/i 

Macro  Lenses  imuTii 


Beam  Splitter  s 
Polarizing  B,S.  HD 


Figure  1 :  Schematic  of  the  database  filter  slotted 
plate  design 
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Another  critica]  a^>ect  is  a  provision  for  mounting 
various  syst^  components  such  as  ^>atial  light 
modulators,  hybrid  VCSEL/HPT  based  logic  arrays 
as  well  as  the  more  common  lenses  and  beam 
splitters.  An  optical  platform  that  provides  these 
characteristics  is  flie  slotted  plate.  The  slotted  plate 
design  for  the  database  filter  is  illustrated  in  Figure 
1. 


3.  Implementation 

The  slotted  plate  offers  inherent  coarse 
alignment  in  the  x  and  y  directions.  Additional 
stability  is  provided  by  magnets  placed  in  the  bottom 
of  tfie  slots.  A  view  in  perspective  of  a  baseplate 
with  component  holders  is  shown  in  Figure  2. 


Figure  2:  In  perspective  conceptual  view  of  a  slotted 
plate  with  optical  components 

Our  implementation  of  the  projection  and 
selection  modules  of  the  filter  utilizes  two 
VCSEL/HPT  based  AND  arrays  and  one  XOR 
array.  The  projection  mask  and  selection  argument 
are  provided  by  two  transmissive  mode  spatial  light 
modulators  (SLMs).  The  board  design  showing  the 
placement  of  these  elements  is  illustrated  in  Figure  3. 
The  optical  source  for  the  data,  selection  and 
projection  inputs  is  provided  by  a  200-mW  edge 
emitting  850-nm  laser.  This  laser  is  placed  on  a 
separate,  smaller  plate  which  connects  to  the 
baseplate  at  ttie  end  of  a  slot.  The  plates  are  designed 
so  that  the  optical  axis  is  positioned  at  the  center  of 
the  slot ,  15-mm  above  the  top  of  the  plate.  The  slots 
are  6.5-mm  deep  and  18-mm  wide  and  are  designed 
for  35-mm  OD  lens  holders.  These  holders 
accommodate  25-mm  diameter  lenses.  Both  25-mm 
polarizing  and  nonpolarizing  beam  splitters  are  used 
as  appropriate  in  the  interest  of  power  conservation. 
Risley  prisms  are  used  for  fine  adjustment  of  the 
optical  paths.  The  SLMs  are  3”x3"xr  and  the 


optoelectronic  (OE)  RAM  with  mount  is  3**X  3'*X 
2* .  The  plate  locations  of  all  ccunponents  along  with 
the  optical  signal  propagation  paths  are  shown  in 
Figure  1. 

This  filter  design  uses  hybrid  VCSEL/HPT 
based  smart  pixels.  Therefore,  special  consideration 
was  given  to  routing  of  the  inputs  and  outputs 
between  each  stage.  The  inputs  and  outputs 
propagate  between  each  stage  in  parallel.  The 
geometric  center  between  these  two  sets  of  signals  is 
placed  at  the  center  of  the  optical  axis.  Packaging  of 
the  VCSEL  and  HPT  chips  consists  of  miniature 
boards  on  which  both  chips  are  placed  side  by  side 
along  with  the  necessary  intercormect  traces.  After 
bonding  8X8  microlens  arrays  on  top  of  the  VCSEL 
and  HPT  arrays,  the  boards  are  mounted  onto  steel 
slugs  as  illustrated  in  Figure  3.  The  most  critical 
aspect  of  the  plate  design  is  in  the  correct 
combination  of  micro  and  macro  optics  to  minimize 
aberrations  and  still  allow  sufficient  path  length  to 
accommodate  necessary  beam  splitters  and  risely 
prisms. 


Figure  3:  Illustration  of  board  containing  VCSEL 
and  HPT  arrays,  mounted  on  a  steel  slug 

The  optical  signals  propagate  between  stages 
from  the  VCSEL  outputs  to  the  input  windows  of  the 
HPTs.  A  layout  of  the  HPT  XOR  array  showing  the 
input  window  area  is  depicted  in  Figure  4.  The 
HPTs  are  on  250-fim  pitch.  Each  array  of  outi>ut 
signals  is  coUimated  by  an  array  of  microlenses,  also 
on  250^m  pitch. 
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Figure  4:  Layout  of  HPTs  in  XOR  configuration 

To  optimize  the  distance  between  successive 
stages,  two  25-mm  diameter  macrolenses  are  used 
with  focal  lengths  tailored  for  each  specific  path.  A 
typical  optical  layout  is  illustrated  in  Figure  5. 


1  i 

M5.4nim  M5.4inni 


Figure  5:  Typical  optical  design  between  two 
successive  smart  pixel  array  stages 

4.  Testing  and  Evaluation 

The  database  filter  will  be  tested  from  the 
bottom  up.  Initially,  the  functionality  of  the  logic 
arrays  and  laser  sources  will  be  determined.  Next, 
the  SLMs  will  be  used  to  drive  the  VCSEL/HPT 
arrays.  After  this  process  is  verified,  beam  splitters, 
lenses,  wave  plates,  etc.,  will  be  added  to  the 
system.  This  will  lead  to  the  testing  of  the  selection 
and  projection  modules  separately.  Finally,  the 
modules  will  be  combined  and  the  entire  filter  will  be 
evaluated.  Test  results  will  be  obtained  optically 
dirough  the  use  of  a  CCD  camera.  Eventually,  the 
camera  will  be  replaced  by  a  custom-designed  OE 
RAM  [3].  that  will  be  optically  loaded  in  parallel. 


5.  Conclusions 

We  have  completed  the  design  of  an 
optoelectronic  database  filter  which  can  perform 
selections  and/or  projections.  We  use  hybrid 
VCSEL/HPT  logic  gate  arrays  to  perform  AND  and 
XOR  operations.  Initial  testing  of  individual 
components  of  the  filter  has  been  successful  and  the 
design  has  proven  viable.  We  now  proceed  further 
by  assembling  a  compact,  rigid  test  platform 
employing  a  slot  plate.  We  are  also  working  on 
developing  a  process  for  monolithic  integration  of 
HPT  and  VCSEL  arrays. 
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Smart  pixels^ consisting  of  photodetectors, 
electronic  circuitry,  and  E/0  converters 
utilizing  ffee-space  optical  interconnections 
show  promise  to  relieve  the  interconnection 
bottleneck  in  computing  and  switching 
systems.  To  reduce  the  propagation  delay 
through  a  smart  pixel,  the  receiver  requires  a 
fast  response,  hence  it  is  essential  to  reduce  the 
front  end  capacitance  (Cj^).  Cjjj  has  three  main 
components:  the  photodiode  active  area,  the 
amplifier  input,  and  the  stray  interconnect 
capacitance  (Cg).  The  FET-SEED  technology 
minimizes  Cg  through  the  monolithic 
integration  of  photodetectors,  modulators  and 
electronic  circuitry.  However,  current 

system  demonstrations  using  FET-SEEDs 
have  been  limited  to  using  medium  scale 


integration  (MSI)  smart  pixel  arrays.  Hybrid 
integration  of  VLSI  Si  CMOS  electronic 
circuitry  with  photodetectors,  modulators,  or 
emitters  is  an  attractive  approach  in  obtaining 
VLSI  smart  pixels  in  the  near  term. 

One  method  of  attaching  III-V  devices  to  Si 
CMOS  is  through  the  use  of  a  flip-chip  solder 
bump  process  and  back  illuminating  the 
photodiode.  A  recent  technique  has  been 
devised  where  GaAs  SEED  detectors/ 
modulators  are  first  flip-chip-bonded  onto  Si 
CMOS,  and  then  the  GaAs  substrate  is  etched 
away  allowing  operation  at  SSOnm.^^^  A 
question  to  be  answered  is  what  stray  input 
capacitance  results  from  this  process. 


Optical  input 


Figure  1:  Cross-sectional  view  depicting  flip-chip  hybrid,  along  with  the  equivalent  circuit,  (not  to  scale) 
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This  paper  investigates  Cg  as  a  function  of 
solder  bump  height  and  diameter,  using  the 
current  process’  design  rules.  Figure  1  depicts 
the  cross-sectional  view  of  the  flip-chip  hybrid 
model  along  with  the  equivalent  circuit.  The 
current  design  rules  dictate  that  the  pads  be 
equally  sized  squares  spaced  one  pad  width 
apart.  Circuits  with  15|im  pads  have  recently 
been  demonstrated.^^^^  The  SEED  chip  has  a 
fixed  ~2|a,m  overhang  beyond  the  pad  size,  and 
the  photodiode  active  area  is  slightly  larger 
than  one  of  the  pads. 

The  total  front  end  capacitance  (Cin)  was  first 
estimated  by  taking  the  sum  of  all  the 
contributing  elements:  =  Camp+  Cdiode+ 

Where  =  Ctrace"*"  Cpad+  C(.j,ip+  Chump*  Th® 
formulas  used  to  approximate  each  element  are 
listed  in  Table  1.  Figure  2  plots  the  estimated 
Cjn  (less  the  fixed  amplifier  contribution)  vs. 
pad  size.  Our  results  indicate  that  the  pad  was 
the  dominate  contributor  to  C*.  Solder  bump 
heights  from  5-20|J,m  were  found  to  induce 
little  change  on  Cg. 

To  check  the  accuracy  of  the  approximations,  a 
3-D  Laplace/Poisson  solver  was  used  to 
calculate  the  total  input  capacitance  vs.  pads 
size  for  a  SEED  bumped  to  the  first  layer  metal 
of  a  Si  wafer.  The  results  are  shown  in  Figure 
3,  and  had  less  than  2%  error  in  symmetry 
preservation  of  the  resulting  Maxwell 
capacitance  matrix.  The  small  shaded  region 
indicates  solder  bump  heights  ranging  from  5- 
20|J.m.  The  results  agree  reasonably  well  with 


Figure  2:  Plot  of  estimated  input  capacitance  as  a 
function  of  bond  pad  size. 


the  estimated  values  (reshown  as  a  dotted  line 
in  Figure  3)  which  appear  to  underestimate  the 
fringing  components  of  the  structure. 

To  verify  the  above  simulations,  CMOS  ring 
oscillators  have  been  designed  with  and 
without  solder  bumped  SEED  loads.  Test 
results  will  be  discussed. 

The  effect  of  thermal  conduction  from  the 
SEED  to  Si  substrate  was  also  examined.  The 
output  contrast  of  a  SEED  modulator  dimin¬ 
ishes  with  change  in  temperature  due  to  the 
shift  of  the  exciton  (0.28nm/’C).  The  amount 
of  heat  generated  in  the  SEED  is  dependent  on 
the  impinging  optical  power  (Pjn),  and  its  state 
of  absorption.  Light  not  reflected  is  absorbed 


ELEMENT 

APPROXIMATION 

DESCRIPTION 

c 

^amp 

c 

'^trace 

^diode 

^pad 

^bump 

^chip 

25fF 

1.2fF 

(Ks)d*(d+2)(115aF/qm^) 
d^(0.031  fF/uiti^)  +  4(d)(0.044fF/um) 
63.56/ aF 

(Ke)(er)(eo)(d(d+4)/h) 

Assumed  amplifier  input  capacitance 

Interconnect  to  amp  is  a  fixed  2x5|rm  trace^*^ 

SEED  active  area^'^  (Fringing  factor  =  0.6(l/d+I)) 
Metal- 1  to  substrate  +  firingingf^l 

Capacitance  between  two  spheres  radius=  r(|j,m)‘^^ 
Conductor  over  a  ground  plane  (GaAs  chip  over  Si)^^^^ 
K(,=fringing  factors  (l.lh/d  +  1)  for h/w<2 

TABLE  1:  Formulas  used  for  the  approximation  of  Cj, 
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Figure  3:  Plot  of  simulated  input  capacitance  as  a 
function  of  bond  pad  size.  Shaded  region  indicates 
solder  bump  heights  from  5-20p,m.  For  compari¬ 
son,  the  estimated  capacitance  is  shown  as  a  dotted 
line. 

as  a  photocurrent,  which  generates  heat. 
Assuming  a  modulator  biased  at  6V,  with  Pj„ 
=500|J.W,  and  a  high/low  state  differential 
responsivity  of  0.2/0.6A/W,  results  in 
P=1.2mW  differential  in  heat  dissipation 
between  the  two  states.  Figure  4  shows  the 
thermal  network  used  to  model  heat  conduc¬ 
tion.  For  a  15|a.m  square  pad  and  lOfxm  bump 
height,  the  following  values  were  esti- 
mated:!^^!  RG,As=19.2k,  Rb^p=1.23k,  RsiO2=0.44k, 
Rtotal=(Rbump+Rsi02)  I'  (Rbump+Rsi02+RGaAs)=k67k  II 
20.9k  =  1.54k.  The  change  in  temperature  of 
SEED  due  to  photo  current  would  be;  AT  = 
(APXRtotai)  =  (1.2mW)(1.54k)  =  1.85°C.  This 
would  result  in  a  negligible  drop  (<.2dB)  in 
output  contrast.  Thus,  the  hybrid  smart  pixel 


^  . 

. . . 

|_  - jGaAs 

Si  '  ^Si02  *  ^Si02 

Figure  4:  Diagram  depicting  the  thermal  network 
of  SEED/Si  hybrid. 


technology  examined  here  has  both  acceptable 
thermal  and  electrical  performance  for  the  cur¬ 
rent  design  rules. 
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The  complexities  of  implementing  neural  network  systems  stem  from  the  requirement  that  each  neuron  can  receive 
excitation  from  many  inputs  (1-1000,  or  more)  and  each  input  must  be  multipUed  by  a  weight  Convenuond  analog 
and  digital  electronic  hardware  implementations  of  neural  architectures  often  use  much  of  Ae  availab  e  Iwdw^  to 
imnlement  the  calculation  of  the  product  of  the  weights  and  inputs,  and  have  to  resort  to  a  time-multiplexing  scheme 
(which  allows  sharing  of  the  multiplier  hardware)  to  implement  networks  with  more  than  a  few  thousand  neurons  m 
the  system.  This  problem  can  be  overcome  by  using  stochastic  computing  techniques.  Therefore,  this  paper  details 
the  results  of  an  investigation  of  the  implementation  of  the  functional  components  of  a  stochastic  bit  stream  neuron 
in  optic/optoelectronic  hardware.  This  approach  offers  several  advantages. 

•  The  stochastic  approach  represents  real  values  through  a  precisely  controlled  probabilistic  ^hnique,  which  makes 
possible  a  complete  and  exact  mathematical  description  and  simulation  of  the  network  functionality  [1,2]. 

•  In  contrast  to  analog  implementations,  digital  networks  can  be  combined  without  introducing  further  uncertainties 
in  the  accuracy  of  the  computation.  Hence  implementations  can  be  scaled  up  without  major  modincauwis. 

•  Optics  allows  the  parallelism  of  the  neural  processing  to  be  maintained. 

•  Exploiting  the  stochastic  properties  of  the  neuron's  processing  allows  the  weighted  sum  of  inputs,  the  application 
of  a  threshold  and  the  neuron's  transfer  function  to  all  be  performed  using  one  unit. 

Figure  1  shows  a  schematic  of  the  functional  components  of  a  bit  stream  neural  element.  Each  of  the  neuron 
weights  and  inputs  are  presented  to  it  as  a  temporal  sequence  of  Vs  and  0*s  where  the  o<xurrence  probability  of  a  1  in 
the  stream  is  proportional  to  the  real  value  it  represents  [1, 2].  The  corre^nding  weight  and  input  bit  str^s  are 
received  by  the  neuron  and  their  bit-wise  multiplication  is  performed  using  a  simple  XNOR  logic  gate  (this  is  a 
consequence  of  the  stochastic  bit  stream  rqiresentation).  The  ouqiuts  of  all  the  XNOR  gates  need  to  be  summed  and 
compared  with  a  probabilistic  threshold  value.  This  can  yield  sigmoid  and  linear  transfer  functions  as  a  con^uen^ 
of  the  interaction  of  the  XNOR  probability  distribution  with  the  threshold  probability  distribution.  Therefore,  the 
stochastic  summation  process  can  be  used  to  impose  the  thresholding  and  transfer  functions  of  the  neuron  on  the  data 
it  processes. 

The  optical  implementation  of  a  stochastic  neural  system  requires  three  distinct  functions  to  be  realised,  either 
optically  or  as  a  mixture  of  optical  and  electronic  technology.  In  the  following  discussion  we  look  at  the  different 
options  that  can  be  thought  of  to  implement  these  functions.  We  consider  in  this  investigation  the  integration  of 
optical  thyristors  [3]  as  the  optical  logic  element  of  the  system. 

Stochastic  Sequence  Generators 

A  number  of  parallel  probabilistic  bit  streams  must  be  generated.  Different  probability  values  may  be  ^i^ed  to 
the  different  bit  streams.  This  can  be  done  by  generating  bit  streams  with  an  occurrence  probability  of  a  bit  being  set 
to  1  of  0.5  and  modulating  them  using  stochastic  processing  techniques  to  impose  probabilities  corresponding  to  Ae 
real  values  (weights  and  inputs)  [4].  The  spatially  parallel  channels  carrying  uncorrelatcd  bit  streams  with 
probability  0.5  can  be  generated  either  by  using  fibre  speckle  [5, 6]  or  cellular  automata  networks  [^.  ... 

In  the  first  case  the  speckle  pattern  of  a  multimode  step  index  fibre  illuminates  an  array  of  differentim  p^  of 
optical  thyristors  [3].  The  speckle  pattern  features  well-known  statistical  probities,  i.e.  it  is  a  gamma  distribution  of 
which  the  degrees  of  freedom  equal  the  number  of  speckle  cells  per  thyristor  [8].  Each  of  the  thyristors  in  the 
differential  pair  is  subjected  to  the  same  light  distribution,  hence  both  optical  thyristors  have  an  equal  chance  of 
switching  on  (if  one  thyristor  switches  on  the  other  is  prohibited  to  do  so),  thus  generating  a  logical  1  or  a  logical  0. 
By  subjecting  the  fibre  to  a  vibratory  motion  (ultra-sound  or  turbulent  air  flow)  and  sampling  the  time-varying 
speckle  pattern  (<=1  MHz)  a  binary  sequence  with  probability  0.5  of  a  bit  being  set  to  1  is  generated.  The  fibre  might 
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be  replaced  by  a  waveguide  in  the  cross  section  of  which  the  refractive  index  can  be  modulated  at  megahertz 
frequencies  (>10  MHz)  by  a  randomly  driven  acousto-optic  modulator.  In  that  case  the  speckle  pattern  could  be 
sampled  more  often  without  introducing  time  correlations  between  consecutive  bits,  hence  enabling  the  system  to  be 
operated  at  higher  clock  frequencies. 

On  the  other  hand  it  is  also  possible  to  design  a  cellular  automata  netwoik  with  cells  which  will  output  a  1  with 
a  probability  of  p=0.5.  The  combinational  logic  of  each  cell  and  the  nearest  neighbour  interconnection  pattern  define 
the  update  rule  of  the  cell.  Integrating  this  logic  in  VLSI  silicon  can  make  the  fabrication  of  this  module  compact. 
Connection  via  flip-chip  bonding  of  the  silicon  circuitry  with  optical  emitters,  like  the  optical  thyristors,  will  allow 
parallel  output  of  each  bit  stream  channel. 

The  consecutive  modulation  can  be  implemented  with  a  dedicated  optical  thyristor  module  (in  the  case  of  fibre 
speckle)  or  a  number  of  parallel-operating  electronic  modulators  (in  the  case  of  cellular  networks).  Both  modulation 
methods  rely  on  the  same  principle,  that  is  they  locally  perform  in  each  parallel  channel  an  AND  or  an  OR  operation 
between  an  incoming  bit  stream  (probability  pin)  and  a  carrier  stream  (probability  0.5).  Whether  it  is  an  AND  or  an 
OR  operation  depends  on  the  bits  of  the  channel's  probability  value  [3].  The  outgoing  bit  streams  have  a  probability 
equal  to  pin/2  in  the  case  of  an  AND  operation  and  Pin+Pin/^  in  the  case  of  an  OR  operation.  Successive  modulation 
steps  can  impose  any  probability  value  out  of  a  number  of  discrete  values  in  the  interval  [0,1]  (e.g.  256  discrete 
values  with  8  modulation  steps). 

The  choice  of  implementation  of  the  bit  stream  generators  is  very  much  dependent  on  the  number  of  input 
signals  that  the  neuron  has  to  process.  Integrating  the  modulators  into  the  electronic  hardware  will  use  a  large 
amount  of  the  available  space  on  the  silicon  chip  and  thus  the  number  of  bit  stream  channels  that  can  be 
implemented  will  be  limited.  Therefore,  an  investigation  into  possible  optical  implementations  of  the  modulator 
(like  the  thyristor  module  that  we  propose)  as  well  as  optical  cellular  automata  implementations  is  necessary;  this  is 
a  long  term  consideration  of  this  research  topic. 

Multiplication  of  Corresponding  Weights  and  Inputs  using  XNOR  Logic 

The  second  functional  block  has  to  perform  the  bit-wise  XNOR  of  the  weight  and  input  bit  streams.  This 
operation  is  implemented  simultaneously  in  all  of  the  spatially  parallel  channels  using  arrays  of  optical  thyristors[9, 
10].  By  using  optics  one  can  benefit  from  the  ease  of  interconnection  and  spatial  overlap  of  the  bit  streams. 

Summation  of  Weighted-Input,  Thresholding  and  Realisation  of  Transfer  Function 

The  third  functional  component  of  the  neuron  must  compare  the  sum  of  the  weighted  inputs  with  a  threshold 
value.  An  optical  implementation  of  the  sum  and  threshold  functions  could  be  accomplished  using  a  spatial  plane 
onto  which  the  optical  output  of  the  XNOR  gates  is  imaged  and  then  summed.  Each  cell  in  the  network  (Figure  2)  is 
an  optical  detector  which  when  illuminated  will  allow  current  to  flow  in  any  direction  across  its  domain,  i.e.  bi¬ 
directional  current  flow  to  and  from  its  nearest  neighbours.  If  a  number  of  the  cells  in  the  matrix  are  illuminated, 
they  will  allow  current  to  flow  across  their  path.  Imaging  the  ouqjut  of  each  XNOR  gate  onto  a  separate  cell  in  the 
matrix  and  determining  whether  or  not  there  is  current  flow  from  the  top  contact  to  the  bottom  contact  of  the  plane 
allows  a  probabilistic  spatial  summation  to  be  achieved.  This  summation  process  has  been  simnlatpd  and  shown  to 
have  a  summation  output  probability  which  is  dependant  on  the  probability  of  the  individual  cells  being  illuminated, 
i.e.  the  average  input  probability  of  the  neuron.  Figure  3  shows  that  the  transfer  function  of  the  input  probability  to 
the  output  probability  has  a  sigmoid  shape  and  is  centred  with  a  threshold  value  of  pin  =  0.5.  The  addition  of  noise 
to  the  spatial  plane  of  detectors  to  either  turn  ON  the  detector  or  to  hold  it  OFF,  causes  a  translation  of  the  threshold 
probability  along  the  x-axis,  increasing  and  decreasing  the  threshold  level.  This  unit  can  also  derive  the  dependency  of 
a  neuron's  output  on  a  particular  input,  which  can  be  used  by  a  learning  algorithm.  If  the  detector  cell  in  the  matrix 
relating  to  an  input  is  turned  OFF  and  as  a  result  the  current  across  the  detector  array  changes  from  conducting  to 
non-conducting,  the  output  is  dependent  on  that  input.  Using  this  technique,  a  stochastic  dependency  estimator  is 
determined  that  can  be  used  by  a  learning  rule  to  train  the  neuron. 

The  architecture  discussed  in  this  paper  uses  stochastic  bit  stream  processing  to  reduce  the  hardware  requirements 
of  a  neuron  by  simplifying  the  multiplication  process  of  real  valued  inputs  and  weights  to  their  bit-wise  XNORed 
combination.  Furthermore,  the  transfer  function  is  realised  as  a  consequence  of  the  statistical  properties  of  the 
threshold  and  the  weighted-input  applied  to  the  probabilistic  summation  process.  At  the  conf^nce  we  will  discuss 
the  neural  architecture  in  more  detail  and  will  explain  the  trade-offs  in  choosing  between  optical  and  electronic 
implementation  of  the  system's  functional  blocks. 
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Stochastic  bit  streams  for  the 
weights  of  the  connections. 


I  -  Stochastic  Input  Bit 
W  -  Stochastic  Weight  Bit 


Sequences  of 
bits  are  produced 
as  time  goes  on 


y 

1  111- 


Translation  of  Activation 
by  Linear  or  Sigmoid 
Activation  Function 


Threshold 


Summation 

Unit 


Stochastic 

Output 

Bit 


XNOR  multiplication 
of  input  bit  and  weight  hit 

Figure  1:  Schematic  of  the  functional  components  of  a  bit  stream  neuron 
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Figure  2:  Detector  network  for  summation,  thresholding  and  realisation  of  transfer  function.  Figure  3:  Adding 
noise  shifts  the  sigmoidal  transfer  function. 
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1.  Introduction 

The  realization  of  practical  optical  or  optoelectronic 
computers  has  been  hampered  by  the  lack  of  algorithms 
suited  to  optoelectronic  implementation.  We  have  cho¬ 
sen  an  algorithm  that  is  particularly  compatible  with 
optoelectronic  processors  and  parallel  access  optical 
memory,  mapped  it  onto  an  architecture  which  satisfies 
the  constraints  of  the  hardware,  and  suggest  an  imple¬ 
mentation  which  is  an  appropriate  combination  of 
optical  and  electronic  technology.  The  proposed  parallel 
optoelectronic  implementation  increases  throughput  by 
several  orders  of  magnitude  over  serial  implementations, 
facilitating  the  real-time  solution  of  large  problems. 

2.  Implementation  issues 

Even  many  of  the  (inherently  parallel)  neural  leam> 
ing  algorithms  that  have  proven  useful  in  practice  are 
difficult  to  implement  in  fully  parallel  hardware.  For 
example,  backpropagation  requires  the  multiplication  of 
an  input  by  a  weight  with  numerical  precision  of  about 
13  bits^  at  each  synapse.  This  precision  requirement  is 
beyond  the  range  achievable  with  analog  information 
processing,  and  the  corresponding  digital  circuits  are 
prohibitively  large.  Also,  backpropagation  requires 
weight  transport  or  multiple  copies  of  the  synaptic 
weights^.  Finally,  since  update  information  is  stored 
for  every  weight  after  every  input  presentation,  a  parallel 
interface  to  secondary  storage  requires  a  transmitter  per 
synapse. 

Fuzzy  ARTMAP^  has  received  a  great  deal  of  atten¬ 
tion  along  with  the  other  ART  algorithms,  but  few 
implementations  have  been  proposed  We  find  that  it  is 
a  practical  algorithm  for  supervised  learning  that  has 
several  important  advantages  for  optoelectronic  imple¬ 
mentation.  In  particular,  only  the  weights  correspond¬ 
ing  to  one  processing  element  (PE)  are  updated  after 
each  training  sample.  This  makes  it  possible  to  seg¬ 
ment  a  large  problem  into  smaller  parts  during  learning, 
loading  the  page  of  weights  corresponding  to  each  sub¬ 
problem  onto  the  processor  from  a  parallel  access  opti¬ 
cal  memory,  but  downloading  the  changed  weights  of 
only  one  PE  via  a  low  bandwidth  electronic  link.  The 
resulting  system  is  much  more  versatile  than  a  system 
capable  of  dealing  only  with  problems  of  a  particular 
size.  Furthermore,  it  performs  well  even  with  weights 
truncated  to  4  bits  during  training^  and  requires  no  mul¬ 
tiplications.  Finally,  it  converges  rapidly  and  uniformly 
with  little  dependence  on  the  particular  choice  of  adjust¬ 
able  parameter  values  and  initial  state. 

3.  Background:  fuzzy  ARTMAP  algorithm 

Fuzzy  ARTMAP  is  essentially  a  clustering  algo¬ 
rithm  (vector  quantizer),  with  supervision  that  r^irects 
training  inputs  which  would  be  grouped  in  an  incorrect 
category  to  a  different  cluster.  As  illustrated  in  figure  1, 


a  fuzzy  ARTMAP  system  consists  of  two  fuzzy  ART 
modules,  each  of  which  clusters  vectors  in  an  unsuper¬ 
vised  fashion,  linked  by  a  map  field.  (Throughout  this 
paper,  vectors  are  denoted  by  bold  letters.)  Typically, 
the  cluster  first  chosen  by  module  a  is  associated  with 
the  module  b  cluster  containing  the  desired  output  vec¬ 
tor.  However,  during  training,  if  the  cluster  to  which 
input  vector  Uk  is  assigned  is  incorrect,  the  map  field 
signals  module  a  and  causes  Uk  to  be  assigned  to  the 
cluster  next  most  likely  to  be  correct.  The  process  is 
repeated  until  Uk  is  assigned  to  a  correct  cluster.  New 
clusters  are  created  as  ne^ed. 


Input,  ai^  Desired  output, 

(during  training) 

Figure  1 :  A  fuzzy  ARTMAP  processor  consists  of  two 
fuzzy  ART  modules  linked  by  a  map  field. 


3.1  Fuzzy  ART  algorithm 

Fuzzy  ART  clusters  vectors  based  on  two  separate 
distance  criteria,  match  and  choice.  The  match  function 
is  defined  by 


where  Wj  is  an  analog-valued  weight  vector  associated 
with  cluster  y,  a  denotes  the  fuzzy  AND  operator, 
{p  A  q).=  min(pi,  and  the  norm  f  j  is  defined  by 

\p\  =  sfPij-  The  choice  function  is  defined  by 


I  AW  j 
a-\-  Wj 


where  a  is  a  small  constant.  Increasing  a  biases  the 
search  more  towards  clusters  with  large  wj. 

Input  vector  h  is  assigned  to  the  category  which 
maximizes ry(/jt) while  satisfying  5y(/*)>p,  where 
the  vigilance,  p ,  is  a  constant,  0  <  p  <  1 . 

Tbe  fuzzy  ART  learning  rule  is  given  by 


WJi  <  I 
Wji  >  f 
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where  0<^<1.  Only  the  weights  of  the  cluster  to 
which  /;fc  has  been  assigned  are  updated.  All  wji  are 
initially  set  to  L 

Carpenter  et  al.^  propose  searching  for  the  category 
J  which  maximizes  Tj  and  then  checking  whether  the 
chosen  category  satisfies  Sj{lk)  ^  P  •  If  not,  category  J 
is  marked  as  ineligible,  and  the  search  is  repeated  until  a 
satisfactory  category  is  found.  The  length  of  time 
between  input  presentation  and  selection  of  the  corre¬ 
sponding  cluster  is  variable,  depending  on  how  many 
search  cycles  are  required.  Furthermore,  the  associated 
three-layer  architecture  is  not  well  suited  to  parallel 
implementation  because  it  requires  weight  transport  or 
multiple,  independently  updated  copies  of  the  weights. 
Section  4.1  describes  how  all  of  these  undesirable 
properties  may  be  eliminated. 

3.2  The  man  field 

The  map  field  is  essentially  a  look-up  table,  retriev¬ 
ing  an  analog- valued  weight  when  module  a  node  J 

and  module  b  node  L  are  active.  Note  that  only  one 
node  of  each  module  is  active  at  a  given  time.  If 
h'/l  <  P"**  vigilance  of  module  a,  p  ,  is  raised  until 
node  J  becomes  inactive  (and  some  other  node  becon^s 
active).  This  process  is  repeated  until  w^^P  . 
When  the  next  input  is  presented,  is  returned  to  its 
baseline  value.  All  wf  are  initially  set  to  1.  During 
learning,  when  nodes  J  and  L  become  active  and 
W/L^P"***  reduced  in  value 

(typicfidly  set  to  0). 

4.  Optoelectronic  implementation 

4.1  A  novel  mapping  of  fiizzv  ART  onto  a  suitable 
architecture 

Fuzzy  ARTMAP  specifies  a  precoding  scheme,  re¬ 
ferred  to  as  complement  coding.  Given  M-dimensional 
feature  vectors  ak^  2M -dimensional  input  vectors 
an  are  generated,  where  af  =  {l-ai).  The 
norm  of  every  input  vector,  |/jk|,  then  equals  M,  the 
dimension  of  a*.  The  match  function  becomes 
Ii,AWj\/M,  and  the  match  criterion  becomes 
l^Awji^pM.  Fuzzy  ART  with  complement  coded 
i  nputs  may  be  mapped  onto  a  neural  network  consisting 
of  only  two  layers,  as  shown  in  figure  2.  / a  \y ji  is 
determined  at  each  sympse  and  the  norm  (summation  of 
the  synaptic  outputs)  is  performed  during  fan-in  along 
the  dendritic  tree.  If  the  match  criterion  is  not  met,  the 
output  of  that  node  is  disabled.  Thus  the  match  criter¬ 
ion  is  computed  for  all  nodes  in  parallel,  and  the  search 
procedure  is  carried  out  only  once  per  input  vector,  elim¬ 
inating  the  variable  delay  described  above.  Weight 
updates  are  carried  out  at  each  synapse  using  only  lo¬ 
cally  available  information,  and  no  weight  transport  is 
required. 

4.2  Implementation  using  D-STQP 

The  most  computationally  intensive  step  in  the 
fuzzy  ARTMAP  algorithm  is  the  computation  of  the 
Tj{lk)  values  in  each  fuzzy  ART  module.  Whereas 
previous  optical  ART  processors^  have  been  limited  to 
the  multiplications  required  by  the  earlier  ART  algo¬ 
rithms,  the  Dual-Scale  Topology  Optoelectronic  Proces¬ 
sor^  (D-STOP)  is  ideally  suited  to  implement  this  oper¬ 
ation.  D-STOP  utilizes  optical  interconnections  and 
electronic  computations,  enabling  it  to  perform  the  nec¬ 


essary  generalized  matrix-vector  multiplication.  The 
optical  system  is  space  invariant,  consisting  of  a  4-f 
imaging  system  that  forms  a  reduced  image  of  the  input 
array,  followed  by  a  single  lens  and  a  computer¬ 
generated  hologram  (CGH).  The  single  lens  images  the 
intermediate  plane  onto  the  output  plane,  and  the  CGH 
replicates  the  de-magnified  image.  Each  D-STOP  PE 
has  ns  optical  inputs,  where  ns  is  the  number  of 
synapses,  and  only  one  output.  One  copy  of  the  input 
array  falls  onto  the  detectors  of  each  PE  in  the  output 
plane.  For  simplicity,  the  output  signals  in  the  imple¬ 
mentation  described  below  are  electrical.  However,  D- 
STOP  is  fully  compatible  with  optical  outputs  as  well. 


Figure  2:  Neural  architecture  for  fuzzy  ART.  In  prac¬ 
tice,  the  inhibitory  interconnections  (shaded)  wiUiin 
layer  2  are  replaced  by  one  additional  PE  that  determines 
which  layer  2  PE  has  the  maximal  output  value. 

A  complete  system  for  implementing  fuzzy 
ARTMAP,  consisting  of  one  D-STOP  per  fuzzy  ART 
module,  is  illustrated  in  figure  3.  The  module  a  proces¬ 
sor  plane  is  also  interfaced  with  a  parallel  access  optical 
memory.  The  pixelated  output  of  the  memory  is  im¬ 
aged  onto  the  detector  array  of  the  processor  plane.  The 
imaging  system  must  be  tailored  to  the  particular  type 
of  parallel  access  memory  which  is  to  be  used.  The 
map  field  may  be  implemented  using  standard  random 
access  memory  (RAM)  chips  and  minimal  additional 
logic.  The  total  size  of  the  RAM  (typically  a  few  kilo¬ 
bytes)  is  aNaNb,  where  <J  is  the  number  of  bits  of 
precision  per  weight  and  und  Nfy  are  the  number  of 
PEs  in  modules  a  and  b,  respectively. 

Figure  4  is  a  schematic  diagram  of  one  processor 
plane.  If  the  number  of  PEs  required  for  module  a, 
is  greater  than  the  number  of  PEs  present  in  hardware, 
the  input  vector  may  be  presented  once  and  stored. 
Pages  of  weights  corresponding  to  sub-arrays  of  PEs  are 
subsequently  loaded  onto  the  processor  plane.  Once  the 
maximal  Tj  of  one  set  of  PEs  has  been  determined,  that 
page  of  weights  is  no  longer  needed  and  is  overwritten 
by  the  next  page.  The  single  page  containing  the 
weights,  wj,  corresponding  to  the  cluster  to  which  Ik 
is  finally  assigned  must  be  loaded  again  before  presenta¬ 
tion  of  the  next  input  vector  in  order  update  wj . 

Fan-in  along  the  H-tree  of  each  processing  element 
requires  0(log  ns)  time,  where  is  the  number  of 
synapses  in  the  PE.  Determination  of  the  maximal  Tj 
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along  the  larger  H-tree  of  the  processor  array  requires 
0(log  N)  time.  The  throughput  is  increased  by 
0(ns/\ogns)  over  a  serial  implementation,  since  the 
time  required  by  the  latter  is  0(ns),  where  the  total 
number  of  synapses  is  ns  =  nsN, 

Other  operations,  such  as  distribution  of  global 
clock  and  control  signals  or  fan-in  of  the  Tj  values 
might  also  benefit  from  (straightforward)  optical  inter¬ 
connections.  However,  we  have  concentrated  on  that 
aspect  of  the  implementation  which  results  in  the  great¬ 
est  increase  in  p^ormance  and  the  greatest  reduction  of 
circuit  area,  yielding  a  simple,  conservative,  realizable 
scheme  which  relies  only  on  hardware  which  has  been 
demonstrated  in  the  lab*^**. 

4=71 


vector  (during  training) 


Figure  3:  Optoelectronic  fuzzy  ARTMAP  processor. 
Optical  connections  are  represented  by  light  cones.  All 
inputs  may  be  active  simultaneously,  but  the  connec¬ 
tions  of  only  one  input  per  module  are  shown.  Module 
a  is  shown  interfaced  to  a  parallel  access  optical  mem¬ 
ory,  drawn  schematically  as  a  box.  (Jhe  actual  medium 
used  might,  for  example,  be  an  optical  disk)  The  map 
field  consists  of  electronic  logic  and  RAM,  and  connec¬ 
tions  to  the  map  field  are  electrical  lines. 

5.  Conclusion 

To  our  knowledge,  this  is  the  first  design  for  a  par¬ 
allel  implementation  of  the  fuzzy  ARTMAP  algorithm. 
The  proposed  mapping  of  the  algorithm  onto  a  neural 
architecture  is  efficient,  requiring  only  an  input  layer  and 
one  processing  layer  per  fuzzy  ART  module,  and  requir¬ 
ing  neither  weight  transport  nor  multiple  copies  of 
weights.  The  proposed  optoelectronic  system  is  simple, 
yet  versatile,  and  relies  on  proven  components.  Opera¬ 
tions  which  may  be  carried  out  using  standard  electronic 
components  without  loss  of  performance  are  carried  out 
electronically.  Computing  the  generalized  matrix- vector 
multiplication  in  parallel  results  in  an  0(ns/logns) 
speed-up  over  a  serial  computation,  where  ns  is  the 
number  of  weights  in  the  larger  of  fuzzy  ART  modules 
a  and  b. 


’’Synapse"  ji 
(detector  and  storage 


Figure  4:  Fuzzy  ART  processor  plane  layout.  Calcula¬ 
tion  of  AWj]  (summation  of  the  synaptic  outputs)  is 
performed  during  fan-in  along  the  H-tree  of  PE  j. 
Which  T j  is  largest  is  determined  during  fan-in  along 
the  H-tree  of  the  PE  array.  When  interfaced  with  a  par¬ 
allel  access  memory,  the  same  detectors  are  used  to 
receive  the  input  values  I ki  and  the  weight  values  wji 
in  subsequent  time  steps.  Since  the  weights  of  only 
one  PE  are  modified  for  every  input  vector,  the  changed 
weights  may  be  off-loaded  via  a  low  bandwidth  elec¬ 
tronic  link. 
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A  holographic  3-D  disk  consists  of  a  disk-shaped  holographic  medium  and  a  recording/ 
read-out  head.  The  head  moves  in  the  radial  direction  while  the  disk  rotates  to  a.llow 
access  to  any  location  on  its  surface.  Multiple  hplograms  are  stored  at  each  location  using 
a  plane  wave  reference  and  either  angle  or  wavelength  multiplexing  [1].  Disks  can  also 
be  constructed  using  peristrophic  [2]  or  phase-code  multiplexing  [3].  No  matter  which  of 
the  above  methods  is  used,  the  multiplexing  mechanism  must  be  incorporated  in  the  head 
along  with  the  CCD,  SLM,  and  passive  optica]  components.  In  this  paper  we  present  a 
multiplexing  method,  shift  multiplexing[4],  which  allows  holograms  to  be  superimposed  at 
one  location  using  only  the  rotation  of  the  disk.  Since  the  mechanical  system  to  rotate  the 
disk  is  already  in  place,  this  multiplexing  method  is  well  suited  for  the  disk  configuration. 
In  addition,  access  to  the  data  is  more  natural  in  the  shift  mode  because  the  continuous 
disk  motion  is  easily  combined  with  successive  hologram  read-out. 

The  shift  multiplexed  holographic  disk  is  shown  in  Fig.  1.  The  structure  is  very  similar 
to  the  angle  multiplexed  disk,  except  the  reference  is  fixed  and  is  either  a  spherical  wave 
or  a  fan  of  evenly  spaced  plane  waves.  The  recorded  hologram  is  reconstructed  with  the 
same  reference  and  the  signal  is  detected  on  the  CCD.  When  the  disk  rotates,  the  recorded 
hologram  is  shifted  with  respect  to  the  stationary  illuminating  reference  beam.  A  relative 
shift  of  a  few  microns  (~  2/im  in  some  experiments)  causes  the  reconstruction  to  vanish 
allowing  a  new  hologram  to  be  recorded  in  the  shifted  position.  We  will  describe  the 
physical  mechanism  that  allows  shift  multiplexing  for  the  case  of  a  spherical  wave  first. 

Consider  a  spherical  wave  reference  originating  a  distance  zq  from  the  center  of  the 
recording  material  (Fig.  1).  In  the  paraxial  approximation  the  reference  beam  is  Rfx)  = 
exp  (j7ra;^/Az).  The  expression  for  the  reconstruction  of  the  shifted  hologram  — 

6)S{x  -  6,y)  is 

R{x)E*{x  -  6)S{x  -  6,  y)  =  exp  exp  (r27r^j  S{x  -  6,  y)  (1) 

The  above  is  a  reconstruction  of  the  signal  S{x,y),  shifted  by  6  and  travelling  in  a 
direction  that  deviates  from  the  original  signal  direction  by  6/z.  This  angular  deviation 
causes  Bragg  mismatch  in  a  way  completely  analogous  to  the  Bragg  mismatch  caused  by 
a  change  in  angle  of  the  read-out  beam  when  the  reference  is  a.  plane  wave.  The  thickness 
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Input  beam 
(recording  arm) 


L  of  the  hologram  determines  the  amount  of  shift  in  the  x  and  y  directions  necessary  to 
Bragg  mismatch  adjacent  holograms: 

= 

= 

where  NA  is  the  mimerical  aperture  of  the  spherical  wave.  Experimentally,  using  an 
objective  lens  of  numerical  aperture  0.9  at  distance  =  6  mm  to  generate  the  spherical 
wave,  we  observed  selectivities  A,,  ~  2/./m  and  dx,  ~  15/im  a  Smm  thick  LiNbOa  crystal. 
The  theoreticaTpredictions  are  0.9//.m  and  47/tm  respectively.  In  a  similar  experiment  with 
zq  «  3cm,  NA  =  0.65  and  L  =  1cm  we  stored  80  image  plane  holograms  separated  by  5/im 
from  each  other.  An  exposure  schedule  was  used  to  attain  uniformity  of  the  hologiams.  A 
reconstruction  of  one  of  the  80  holograms  is  shown  in  Figure  2.  No  evidence  of  crosstalk 
from  other  holograms  was  observed. 

Shift  multiplexing  can  also  be  implemented  using  a  fan  of  M  plane  wave  components 
uniformly  separated  by  AO  as  a  reference.  Upon  reconstruction,  if  the  reference  beam  is 
exactly  aligned  with  the  composite  hologram,  individual  holograms  recorded  by  different 
components  are  in  phase  and  interfere  constructively.  A  shift  b  of  the  reference  relative  to 
the  hologram  produces  destructive  interference  due  to  different  phase  delays  in  the  recon¬ 
structed  components.  It  can  be  shown  that  the  diffraction  efficiency  is  proportional  to  the 
array  function  sin^  (ttM^A^/A)  /  sin^  (irAAf^/A).  A  conservative  estimate  for  the  number 
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Figure  2:  (left)  Reconstruction  of  one  out  of  SO  holograms  using  a,  spherical  wave  reference, 
(right)  Shift  Multiplexing  with  multiple  plane  waves:  reconstruction  of  three  holograms 
(A,  B,  C). 

of  holograms  that  can  be  multiplexed  with  this  approach  is  M/2.  M  is  determined  by  the 
numerical  aperture  of  the  optics  while  /SO  must  equal  the  Bragg  selectivity  A/jbtan05.  In 
one  experiment  we  multiplexed  3  holograms  in  DuPont’s  38-micron  photopolymer  using 
M  =  20  plane  waves  as  reference  (Figure  2).  The  measured  shift  selectivity  of  each  holo¬ 
gram  was  »  5/rm  and  the  periodicity  of  the  array  function  was  w  55/im.  Hence,  there  is 
room  for  up  to  11  holograms  (~  M/2)  in  this  case.  lJ.sing  difi'ractive  or  holographic  optical 
elements  a  reference  beam  fan  with  M  larger  than  1,000  can  be  realized.  Assuming  that 
the  material  thickness  can  also  be  increased  accordingly,  a.  correspondingly  large  number 
of  multiplexed  holograms  can  l)e  attained. 
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2-photon  absorption  based  3-D  optical  memories 


The  2-photon  absorption  based  three  dimensional  memory^  is  an  optical  storage  device  where  the 
bits  of  information  are  stored  throughout  the  volume  of  a  material.  The  use  of  optical  beams  tor 
write/read  operations  allows  the  data  to  be  densely  packed  inside  the  material,  hence  increasing 
the  memory  density.  The  third  dimension  allows  the  information  to  be  accessed  in  parallel 
providing  a  high  data  transfer  rate. 


The  physical  process  enabling  this  memory  is  as  follows:  a  photochromic  molecule  which  is 
embedded  in  a  polymer  host  matrix  is  excited  from  its  ground  state  to  a  higher  energy  state  by 
the  simultaneous  absorption  of  two  photons,  which  may  be  from  two  different  optical  beams  and 
different  wavelengths.  One  of  the  photochromic/polymer  host  systems  that  we  have  characterized 
is  spyrobenzopyran  (SP)  in  poly(methyl  methacrylate)  (PMMA).  In  SP,  the  simultaneous 
absorption  of  the  two  different  colored  photons  results  in  a  bond  dissociation  and  the  structure 
changes  transforming  it  into  a  new  form  with  a  different  absorption  spectrum.  These  two 
different  molecular  forms  are  defined  as  the  unwritten  and  written  forms  of  the  memory.  Vmous 
bit  planes  can  be  stored  by  intersecting  the  two  optical  beams  at  various  locations  within  the 
memory  volume.  These  addressing  beams  can  be  arranged  to  propagate  either  collmearly  or 
orthogonal  to  each  other2  (Fig.  i).  The  written  information  is  read  by  means  of  re-emitted 
fluorescence.  It  is  possible  to  erase  a  written  bit  by  illuminating  it  with  an  optical  beam  at  around 
532  nm. 


Selecting  the  wavelengths  of  the  optical  beams  is  an  important  issue  since  the  written  and 
unwritten  forms  may  show  different  absorption  cross-section  to  a  certain  wavelength  and 
different  ways  of  addressing  the  memory  have  different  requirements.  A  computer  memory  may 
be  expected  to  carry  data  for  relatively  long  periods  of  time,  hence  the  persistence  of  the  written 
and  unwritten  forms  are  important  parameters.  Another  issue  is  the  cyclability  of  the  memoty. 
Finally,  the  data  uniformity  is  also  dependent  on  the  way  the  memory  is  addressed  and  the 
wavelengths  used. 


Wavelength  selection  for  write  operation 


In  order  to  eliminate  previously  stored  bit  locations  being  affected  during  the  writing  of  new  bit 
locations,  absorption  of  the  write  wavelengths  by  the  written  bits  must  be  negligibly  The 
absorption  spectra  of  the  written  and  unwritten  forms  of  the  memory  volume  are  shown  in  Pig.  2. 
When  the  beams  are  arranged  so  as  to  propagate  orthogonal  to  each  other  a  beam  at  1064  nm  and 
its  second  harmonic  at  532  nm  can  be  used  to  address  a  location  and  wnte  at  that  location  in  the 
memory.  However,  if  the  beams  are  arranged  to  propagate  collmearly,  the  optical  beam  at  532 
nm  will  be  absorbed  by  other  written  bits  on  the  path  and  eventually  cause  them  to  be  erased. 
Thus  a  wavelength  which  is  not  strongly  absorbed  by  the  written  form  of  the  memory  such  as 
450  nm  is  required.  While  the  1064  nm  beam  may  be  absorbed  by  the  written  bits  via  2-photon 
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absorption,  this  process  is  much  less  efficient  than  the  1-photon  absorption  at  532  nm.  Thus,  the 
use  of  450  nm  and  900  nm  writing  beams  minimizes  information  erasure  during  writing. 


Persistence  of  written  and  erased  states 

A  given  memory  bit  volume  is  composed  of  both  written  and  unwritten  SP  molecules.  A  memory 
bit  volume  is  considered  written  if  a  finite  number  of  molecules  (set  by  various  system 
constrmnts)  are  present  within  the  bit  volume.  Thermal  effects  can  shift  the  equilibrium  between 
the  written  and  unwritten  molecules  causing  written  molecules  to  relax  back  to  their  unwritten 
forms  and  vice  versa^.  Hence  depending  on  environmental  temperature  the  number  of  written 
molecules  in  a  ‘written’  bit  volume  may  eventually  approach  to  the  number  of  unwritten 
molecules.  The  period  during  which  a  written  bit  volume  can  be  detected  as  ‘written’  is  the 

persistence  of  the  written  form.  For  SP,  the  written  form  persistence  is  years  at  77°K,  months  at 

3°C,  and  hours  at  room  temperature.  However,  the  written  form  stability  at  room  temperatures 
may  be  improved  by  embedding  SP  molecules  into  a  polar  host  matrix,  e.g.  poly-hydroxy- 
ethylmethacrylate.  Since  SP  molecules  are  polar  in  their  written  forms,  polar  polymers  can  be 
used  to  anchor  the  two  ends  of  written  SP  molecules. 

Fatigue  induced  by  write/erase  cycles 

For  a  material  to  be  useful  in  a  write-read-erase  memory  device,  it  should  be  able  to  withstand  a 
large  number  of  cycles  (write-read-erase)  with  minimal  deviation  from  its  original 
characteristics.  For  a  SP  doped  PMMA  memory  a  small  number  of  SP  molecules  dissociate, 
upon  excitation,  and  generate  a  new  molecule  which  is  not  writable.  If,  in  a  given  volume,  the 
number  of  SP  molecules  which  are  written  and  erased  in  each  operation  is  large,  then  the 
material  will  decay  quickly.  This  has  been  demonstrated  by  repeatedly  writing  a  memory  volume 
using  a  UV  source  and  erasing  it  with  light  at  514  nm.  By  examining  the  time  required  to  write 
and  erase  to  specific  optical  densities,  we  can  evaluate  the  accumulation  of  the  unwritable  form. 
Fig.  3  (a)  shows  the  number  of  write/erase  cycles  that  can  be  performed  before  the  material 
shows  considerable  fatigue  when  90%  of  the  molecules  were  written  and  erased  in  each  cycle. 
Fig.  3  (b)  shows  the  same  effect  but  when  only  75%  of  the  molecules  were  written  and  erased.  It 
is  evident  that  the  number  of  cycles  were  improved  by  decreasing  total  number  of  written  and 
erased  molecules  in  each  cycle.  Thus,  by  appropriately  scheduling  the  usage  of  the  molecules 
within  each  bit  volume,  arbitrarily  high  degrees  of  cyclability  may  be  achieved. 

Two-dimensional  data  uniformity 

One  way  of  storing  two-dimensional  data  in  the  2-photon  absorption  based  optical  memory  is  to 
propagate  an  image  carrying  beam  orthogonal  to  an  addressing  beam  which  is  focused  into  the 
memory  as  a  sheet  of  light  parallel  to  the  data  image2(Fig.  1  b).  The  two-dimensional  data  plane 
will  be  recorded  where  the  two  beams  intersect.  However,  since  the  interaction  starts  from  the 
side  of  the  data  plane  and  the  photons  are  absorbed  as  the  addressing  beam  travels  through  it,  in  a 
practical  system,  more  molecules  will  be  written  for  one  side  of  the  image  then  the  other. 
Similarly,  when  the  read-out  beam  is  brought  from  the  side  to  traverse  the  two-dimensional  data 
plane,  the  read-out  beam  will  be  absorbed  as  it  travels  through  the  data  plane.  This  will  cause 
nonuniform  fluorescence  emission  along  the  plane  as  shown  in  Fig  4.,  where  the  fluorescence 
intensity  distribution  along  5  two-dimensional  data  planes  at  different  locations  in  the  memory  is 
plotted.  The  data  stored  in  the  planes  were  all  1  ’s  and  the  data  was  stored  uniformly.  The  spikes 
at  the  end  of  the  curves  are  due  to  back  reflections  from  the  material-air  interface.  The  emitted 
fluorescence  nonuniformity  can  be  compensated  to  a  degree  by  addressing  the  memory  from  one 
side  during  the  writing  and  symmetrically  addressing  the  memory  from  the  other  side  during  the 
reading. 
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Fig.  1  (a)  Collinear  and  (b)  orthogonal  addressing 


Fig.  2  Absorption  spectra  of  written 
and  unwritten  forms. 


Fig.  4  Fluorescence  intensity  distribution 
along  5  data  planes  at  different  locations 
material. 


(a) 


(b) 

Fig  3  Repeated  write/erase  operation  induced  material 
fatigue:  (a)  with  90%  of  the  molecules  in  the  memory 
written  (b)  with  70%  of  the  molecules  written. 
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Photorefractive-based  optical  memory  systems  have  received  considerable  attention  in  recent  years  in  these 

systems,  numerous  volume  holograms  are  stored  within  a  single  photorefr active  crystal,  with  the  writing  and  recall 
selectivity  generally  being  performed  via  angle  wavelength  [2^3]^  phase  W  multiplexing.  As  well,  non- 
photorefractive-based  optical  memory  systems  have  been  proposed.  In  this  paper,  we  present  a  hybrid  system 
that  utilizes  a  combination  of  wavelength  and  angle  multiplexing  in  a  photorefractive  medium.  Such  a 
hybridization  is  driven  by  photonic  source  generation  and  manipulation  device  limitations,  as  well  as  the  desire  to 
expand  information  throughput  rates  in  optical  memory  systems  via  spectral  parallelism  Our  design  is 
essentially  the  summation  over  wavelength  of  many  angularly  multiplexed  volume  memory  systems.  We  therefore 
compare  our  system’s  parameters  to  those  of  purely  wavelength  or  angle  multiplexed  ones. 

We  begin  by  considering  the  limitations  of  wavelength  or  angle  multiplexed  volume  holographic  storage 
systems  [1'3]  a  typical  wavelength  multiplexed  system,  a  single  laser  source  is  expected  to  tune  rapidly  over 
-100  nm  with  a  precision  of  -0.1  nm.  This  performance  is  presently  unavailable  in  a  compact  package.  In  a  typical 
angle  multiplexed  system,  an  angle  tuning  device  is  expected  to  direct  a  laser  beam  across  -10  deg.  with  a  precision 
of  -0.01  deg.  Modern  acousto-optic  deflectors  (AODs)  are  capable  of  this  tuning  range  when  their  beam  deflection 
angles  are  magnified  with  optical  telescopes,  but  their  space-bandwidth  products  remain  fixed,  and  tuning  over  -10 
deg.  requires  large  numerical  aperture  optics  which  are  unattractive  in  a  compact  system. 

To  circumvent  these  limitations,  we  propose  our  hybrid  system  which  utilizes  W  discrete-wavelength 
laser  sources  and  a  continuously  tunable,  broadband  angle  multiplexing  device.  In  this  manner,  our  system  is 
composed  of  multiple  angular  multiplexing  systems,  each  operating  at  a  unique  wavelength.  It  therefore  has 
properties  similar  to  a  purely  angle  multiplexed  system,  but  with  the  added  advantages  of  spectral  parallelism.  As  in 
any  volume  holographic  optical  memory  system,  what  must  be  considered  here  is  a  sufficient  non-overlap  condition 
between  the  grating  K  vectors  of  different  holograms  (IKI  =  27i/A,  where  A  is  the  fundamental  grating  spacing  for  a 
particular  hologram)  In  our  hybrid  system,  this  means  that  within  any  given  wavelength  a  minimum  angular 
separation,  59,  must  exist  between  holograms  to  minimize  angle  cross-talk  noise,  and  across  any  given  wavelength  it 
limits  the  system  to  a  maximum  angular  span,  A0,  to  minimize  wavelength  cross-talk  noise.  This  concept  is 
presented  in  Fig.  1,  where  A  is  plotted  for  four  discrete  wavelengths  Xj  =  476.5  nm,  X2  =  488.0  nm,  X3  =  496.5  nm, 
and  X4  =  514.5  nm  as  a  function  of  (continuous)  full  beam  external  angle,  0,  under  conditions  of  complementary 
incidence,  as  shown  in  Fig.  2.  For  this  example,  if  50  -  0.01  deg.  and  A0  -  2  deg.,  then  the  storage  of  800 
holograms  is  achieved.  Clearly,  utilizing  optimized  wavelength  sources  and  angular  tuning  ranges,  this  number  can 


226  /  OWA4-2 


-g*  0.155 

li 


^  0.15 


: - 1 - 

- ^ - - 

X  =  0.5145  jitn 

X=  0.4965  iim 

X=  0.4880  fim 

=  0.4765  nm 

; _ i - : 

i - \ - ^ 

88.5 


89.5 


e  ^  ,  (deg.) 

•xUrnal 


Fig.  1.  Conceptual  demonstration  of  our  sparse- wavelength  Fig.  2.  Schematic  of  our  experimental  geometry, 

angularly  multiplexed  volume  holographic  memory  system. 


easily  be  extended  into  the  thousands.  For  example,  with  a  single  spectral-byte  (8  wavelengths)  and  angle 
multiplexing  in  one  dimension  over  ~  6  deg.,  our  hybrid  system  is  capable  of  storing  ~  5,000  holograms,  each  with 
~10^  pixels,  in  a  cubic  centimeter  crystal.  Upon  readout,  the  holograms  can  either  be  addressed  sequentially  in 
wavelength  at  a  particular  angle,  or  spectrally  in  parallel  thus  increasing  the  data  throughput  rate  by  a  factor  of  W 
over  non-hybridized  systems.  As  well,  such  a  system  is  naturally  compatible  with  multiwavelength  information 
processing  wherein  each  wavelength  can  represent  either  a  given  bit  in  a  binary  word  or  a  spectrally  unique 
data-set  carrier  for  each  angle. 

To  demonstrate  our  hybrid  system,  we  utilized  the  scheme  presented  in  Figs.  1&2,  obtaining  our  beams 
from  an  Ar’’’^  laser.  After  spectral  separation,  selection,  and  recombination,  these  ordinarily  polarized  beams 
entered  a  broadband  acousto-optic  deflector  (AOD)  collinearly.  The  AOD  could  tune  their  individual  deflections 
over  -0.85  deg.  each,  with  the  respective  output  wavelength  and  angle  relations  following  the  equation  =  0j/6k 
(with  the  proper  set-up  geometry,  this  added  angular  content  can  advantageously  decrease  the  necessary  wavelength 
separations).  Through  a  beamsplitter,  the  AOD  output  face  was  imaged  onto  both  the  reference  and  object  beam 
input  planes,  which  were  then  each  imaged  into  complementary  faces  of  a  LiNb03;Fe  crystal,  8  mm  on  each  side. 
(This  geometry  doubled  the  angular  tuning  range  accessible  from  our  AOD  and  eliminated  the  need  for  two 
additional  AODs  necessary  to  match  the  optical  frequencies  of  the  object  and  reference  beams  during  information 
storage.  While  it  does  place  angular  information  on  the  output  object  beam,  the  detection  plane  of  this  beam  will  be 
virtually  unaffected.  This  is  because  this  beam  contains  high  information  content,  and  will  therefore  require  imaging 
anyway  during  which  the  different  (multiplexed)  image  angles  will  not  produce  different  overall  image  positions.) 
We  then  utilized  0-90  deg.  and  80  =  0.01  deg.  to  store  100  high  resolution  holograms  with  these  four  wavelengths 
over  a  span  of  A0  =  0.25  deg.  The  recall  from  this  storage  is  presented  in  Fig.  3.  Figure  3(a)  shows  the  recall  of  the 
image  stored  at  0i  =  90.00  deg.  and  =  476.5  nm.  Fig.  3(b)  shows  the  recall  of  the  image  stored  at  09  =  89.92 
deg.  and  X2  =  488.0  nm.  Fig.  3(c)  shows  the  recall  of  the  image  stored  at  On  =  89.84  deg.  and  X,3  =  496.5  nm,  and 
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(a)  (b)  (c)  (d) 


Fig.  3.  Experimental  results  from  the  storage  of  100  holograms.  (a)^i,0i.  (b)  X.2,  Oq.  (c)A-3,0i7.  (d)^4,  025- 

Fig.  3(d)  shows  the  recall  of  the  image  stored  at  025  =  89.76  deg,  and  X4  =  514.5  nm. 

Our  hybrid  sparse-wavelength  angularly  multiplexed  volume  holographic  memory  system  shows 
significant  advantages  over  other  non- hybridized  systems,  including  relaxed  demands  on  optical  sources  and 
components  and  an  overall  increase  in  information  throughput  rates.  We  have  demonstrated  high  resolution, 
compact  storage  of  100  volume  holograms,  and  shown  how  our  system  can  be  capable  of  storing  many  thousands  of 
such  holograms.  In  addition  to  presenting  further  experimental  results  demonstrating  progress  in  our  work,  we  will 
discuss  issues  such  as  effects  on  storage  capacity  due  to  cross-talk  noise  and  crystal  dynamic  range,  as  well  as 
feasibility  arguments. 
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One  of  the  most  attractive  features  of  an  optical  memory  is  its  ability  to  write  and  read 
data  in  a  bit-parallel  format,  giving  rise  to  theoretically  very  high  (in  excess  of  1  Gbps)  data 
transfer  rates.  However,  such  a  potential  has  not  been  demonstrated  experimentally  because 
of  various  inherent  technical  difficulties  associated  with  existing  optical  storage  techniques.  In 
a  photorefractive  or  a  persistent  spectral  hole-burning  (PSHB)  memory,  for  example,  the  time 
required  to  record  one  page  varies  from  a  fraction  of  a  second  to  several  seconds.  ^  For  a  page 
containing  1000  x  1000  bits  of  data,  it  translates  to  a  bandwidth  of  approximately  1  Mbps, 
substantially  slower  than  any  existing  semiconductor  memories. 

Coherent  time-domain  optical  memory  (CTDOM)  has  been  showing  promising  potentials 
for  high-speed  data  storage.'^'^  Fast  recording  and  readout  of  sequential  digital  optical  data  in 
a  CTDOM  at  40  Mbps  have  been  demonstrated,"^  and  an  I/O  bandwidth  of  several  Gbps  is 
predicted  for  sequential  recording.  Recently,  we  have  proposed  a  practical  scheme  for  parallel 
data  storage  in  CTDOM.  In  a  proof-of-principle  experiment,  four  wavelength-multiplexed 
single-page  volume  spectral  holograms,  generated  with  black-and-white  transparencies,  were 
successfully  stored  in  a  single  spatial  location  at  a  rate  of  approximately  43  Kfps.  The  experi¬ 
mental  results  project  an  I/O  bandwidth  of  the  system  to  exceed  40  Gbps,  which  would  be  400 
times  faster  than  that  of  a  semiconductor  cache  memory. 

Here  we  report  on  the  further  use  of  the  proposed  scheme  to  store  a  large  number  of 
wavelength-multiplexed  single-page  spectral  holograms.  The  purpose  of  this  work  is  three¬ 
fold:  a)  Examine  the  feasibility  of  using  a  spatial  light  modulator  (SLM)  for  information  encod¬ 
ing  in  CTDOM.  This  work  is  needed  because  of  a  large  insertion  loss  introduced  by  an  SLM. 
b)  Determine  the  realistic  limit  on  recording  speed  of  an  SLM-based  CTDOM.  c)  Examine 
the  potential  effect  of  wavelength  multiplexing  on  diffraction  efficiency  as  more  holograms 
are  stored.  We've  successfully  recorded  100  spectral  holograms  at  one  spatial  address  in  a 
Eu3+:Y2Si05  crystal  by  wavelength  multiplexing.  Despite  of  the  large  insertion  loss  from  the 
SLM,  a  frame  recording  speed  in  excess  of  13  Kfps  was  obtained  with  the  use  of  a  low-power 
laser. 

The  experiments  were  performed  on  the  ^Fq-^Dq  transition  (site  1  at  579.88  nm)  of 
Eu^‘'':Y2Si05,  which  has  an  inhomogeneous  linewidth  of  ~4  GHz  and  a  dephasing  time  of  ~1  ms. 
Recording  and  playback  were  controlled  entirely  by  a  computer.  In  recording  holograms,  the 
computer  first  tuned  to  a  desired  wavelength  (or  data  channel)  within  the  inhomogeneously 
broaden  absorption  line,  downloaded  a  pre-selected  frame  to  an  SLM  through  a  frame  grabber 
for  information  encoding,  and  then  illuminated  the  sample  with  the  reference  and  data  pulses. 
This  procedure  was  repeated  a  different  channel  until  all  100  frames  were  stored.  The  data  were 
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later  retrieved  by  illuminating  each  channel  with  a  read  pulse,  and  the  reconstructed  images  were 
detected  by  a  gated  intensified  CCD  camera  and  digitized  by  the  frame  grabber. 

The  Spatial  light  modulator  used  is  a  liquid  crystal  array  taken  from  a  projection 
television  (InFocus  TVT-6000).  This  array  has  480  x  440  pixels  and  its  insertion  loss  is  approxi¬ 
mately  97%  for  zero-order  transmission.  Two  approaches  can  be  used  to  compensate  for  the 
large  loss:  To  increase  the  laser  power  by  a  factor  of  30  or  to  increase  the  length  of  the  data  pulse 
since  the  camera  detects  time-integrated  signals.  The  former  approach  is  not  practical  because  it 
would  require  a  laser  power  in  the  vicinity  of  10  W.  We  chose  the  latter  and  used  a  50  |i.s  long 
data  pulse  with  a  peak  power  of  only  ~7  mW.  The  reference  and  read  pulses  were  14  |is  long 
and  biphase  modulated  with  the  7-bit  Barker  code  to  obtain  a  data  channel  width  of  ~1  MHz. 

The  separation  between  the  reference  and  data  pulses  was  chosen  to  be  10  |is  which,  in  principle, 
could  be  reduced  down  to  ~1  ps  to  increase  the  recording  speed.  The  recording  speed  thus  was 
74  ps/frame,  or  13.5  Kfps.  Further  details  about  the  experiments  can  be  seen  elsewhere.^ 

Five  minutes  after  the  completion  of  the  recording,  we  read  out  the  data  by  illuminating 
each  spectral  grating  with  the  read  pulse.  Figure  1  shows  some  of  the  reconstructed  images.  The 
100  stored  images  were  spaced  evenly  across  a  spectral  window  that  was  ~300  MHz  wide  and 
occupies  only  7.5%  of  the  inhomogeneous  linewidth.  Under  this  condition,  we  saw  neither  cross 
talk  nor  any  effect  on  diffraction  efficiency,  and  a  nearly  constant  efficiency  of  ~10'3  was 
measured. 


CPM-5310-19 


Fig.  1.  Experimental  results  showing  4  out  of  100  reconstructed  spectral  holograms. 

Since  there  exist  two  distinct  optical  site  for  the  ^Fq-^Dq  in  Eu3+:Y2Si05,7  the  above 
experimental  results  suggest  a  minimum  storage  capacity  of  2660  frames  per  spatial  spot  (or 
1330  frames  per  optical  site)  for  this  one-frame-per-channel  approach.  We  believe  that  by 
reducing  the  data  channel  width  to  -0.5  MHz  with  a  frequency-stabilized  laser,  a  capacity  of 
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over  4000  frames  per  spatial  spot  is  achievable  in  Eu3+:Y2Si05.  We  can  further  estimate  this 
capacity  for  binary  digital  data  storage.  Assume  that  each  page  has  1000  x  1000  pixels  and  each 
information  bit  is  represented  by  a  block  of  4  x  4  pixels.  Under  this  condition,  one  would  obtain 
a  capacity  in  excess  of  250  Mbits  per  spot.  By  taking  into  account  the  spot  size  (which  is  ~1.0 
mm  X  1.0  mm  x  7.0  mm), 5  one  would  have  a  density  of  ~35  Gb/cm^. 

The  above  estimate  can  be  further  extended  to  system's  I/O  bandwidth.  For  a  page 
containing  250  x  250  bits,  the  achievable  data  throughput  rate  would  exceed  840  Mbps.  A  more 
optimistic  estimate  assuming  one  bit  per  pixel  would  yield  a  bandwidth  of  over  10  Gbps,  which 
would  be  100  times  faster  than  the  existing  semiconductor  cache  memory. 

In  conclusion,  we  have  demonstrated  the  storage  of  100  single-page  spectral  holograms 
at  a  single  spatial  location  in  a  Eu3+:Y2Si05-based  CTDOM  using  an  SLM.  The  large  insertion 
loss  introduced  by  the  SLM  has  no  significant  effect  on  system  performance.  A  frame  transfer 
speed  of  more  than  1.3  Kfps  was  obtained  with  a  modest  laser  power.  Experiments  to  demon¬ 
strate  the  storage  of  a  much  larger  number  of  holograms  are  underway  to  fully  utilize  the  entire 
inhomogeneously  broadened  absorption  line. 
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Application  of  Fourier  Optics  for  Defect  Detection  in  Microelectronics  Fabrications 


Lawrence  H.  Lin 
Optical  Specialties,  Inc. 


Fourier  optics  offers  a  simple  and  effective  means  for  detecting  defects  in  the  fabrication 
processes  of  semiconductor  devices  or  flat  panel  displays.  Application  to  commercial 
equipment  development  will  be  presented. 
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INTRODUCTTON 

We  are  developing  a  class  of  optical  phased-array-radar  processors  which  use  the  large  number  of  degrees-of-ffeedom 
(DOF)  available  in  three-dimensional  photorefractive  volume  holograms  to  time  integrate  the  adaptive  weights  in 
order  to  perform  beam-steering  and  jammer-cancellation  signal-processing  tasks  for  very  large  phased-array 
antennas[l,2].  For  a  large  broadband  phased-array  antenna  containing  1000s  of  array  elements,  beam  steering  and 
jammer  cancellation  in  a  dynamic  signal  environment  represents  an  extremely  demanding  signal  processing  task  well 
beyond  the  capabilities  of  microelectronic  digital  signal  processing  because  of  the  large  number  of  DOF  required  for 
adaptation.  The  three-dimensional  nature  of  the  signal  environment  (2  angle-of-arrival  and  frequency)  represents  a 
signal  processing  problem  which  maps  well  into  a  highly  parallel  optical  processing  architecture  utilizing 
photorefractive  volume  holograms.  The  beam-steering  and  jammer-nulling  processor  we  present  uses  relatively 
simple  components;  two  photorefractive  crystals,  two  single-channel  high-speed  detectors,  and  two  single  channel 
acousto-optic  Bragg  cells.  The  bandwidth  capabilities  of  these  components  approach  a  GHz  allowing  the  processing 
of  wide-band  signals.  The  required  number  of  processor  components  used  for  implementing  the  adaptive  algorithm  is 
independent  of  the  number  of  elements  in  the  phased-array  in  contrast  to  traditional  electronic  or  acousto-optic 
approaches[4,5],  in  which  the  hardware  complexity  of  the  processor  scales  in  proportion  to  array  size.  We  describe 
the  two  main  subsystems  of  the  processor,  the  beam-forming  and  the  jammer-nulling  subsystems,  and  present 
results  demonstrating  simultaneous  main  beam  formation  and  jammer  suppression  in  the  combined  processor. 

2.  BEAM-FORMING  PROCESSOR 

The  beam-steering  processor  calculates  the  angle-of-arrival  (AOA)  of  a  desired  signal  of  interest  and  steers  the  antenna 
pattern  in  the  direction  of  this  desired  signal  by  forming  a  dynamic  holographic  grating  proportional  to  the 
correlation  between  the  incoming  signal  from  the  antenna  array  and  the  temporal  waveform  of  the  desired  signal.  The 
grating  is  formed  by  repetitively  applying  the  temporal  waveform  of  the  desired  signal  to  a  single  acoustooptic  Bragg 
cell  and  allowing  the  diffracted  component  from  the  Bragg  cell  to  interfere  with  the  optical  mapping  of  the  received 
phased-array  antenna  signal  at  a  photorefractive  crystal  (PRC).  The  diffracted  component  from  this  grating  is  the 
antenna  output  modified  by  an  array  function  pointed  towards  the  desired  signal.  The  only  a  priori  information 
required  for  beam-steering  is  a  reference  waveform  that  correlates  well  with  the  desired  signal. 

The  beam-forming  processor  is  shown  schematically  in  the  upper  portion  of  figure  1.  The  figure  shows  a  broadband 
signal  of  interest  and  a  narrowband  jammer  incident  upon  a  phased  array  antenna.  The  output  from  the  antenna 
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Figure  1.  Schematic  representation  of  adaptive  beam-forming  and  jammer-nulling  phased-array  radar  processor. 
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(a)  .  .  .  . 

Figure  2  Frequency  spectrum  of  processor  output  demonstrating  beam  formation  in  the  direction  of  broadband  signal 
of  interest  in  (a)  for  received  signal  scenario  shown  in  (b),  (c)  measured  array  function  (1  MHz/.div,  10  dB/div). 

elements  are  upconverted  to  the  optical  domain  using  electro-optic  phase  modulators  fed  by  a  common  laser  and 
coupled  into  optical  fibers  for  delivery  to  the  processor.  In  the  figure  each  fiber  is  shown  cut  to  precisely  the  same 
length  thus  preserving  the  same  phase  relationship  between  the  array  elements  of  the  antenna,  however  random 
lengths  do  not  affect  processor  operation.  The  phased-array  antenna  and  electro-optic  upconversion  is  simulated  in  the 
lab  using  acousto-optic  modulators  to  represent  far-field  point  sources,  allowing  several  sources  to  be  input  into  the 
processor  at  different  AOAs.  A  diffuser  can  be  used  to  simulate  the  complex  phase  front  that  would  result  from 
unequal  length  fibers,  as  well  as  a  Ronchi  ruling  to  simulate  the  sampled  nature  of  the  phased  array  and  fiber  bundle. 
The  optical  output  of  the  phased-array  simulator  and  the  diffracted  component  from  the  Bragg  cell  fed  with  the 
reference  signal  are  both  incident  on  a  PRC,  effectively  forming  a  bank  of  time-integrating  correlators  throughout  the 
volume  of  the  crystal.  A  strong  correlation  between  the  two  waves  will  exist  corresponding  to  a  particular  time  delay 
in  the  Bragg  cell  and  AOA  at  the  antenna.  The  resulting  stationary  interference  pattern  will  build  up  a  hologra^ic 
grating  in  the  PRC,  which  will  diffract  a  portion  of  the  simulated  phased-array  output  to  the  heterodyne  detector.  The 
diffracted  component  represents  the  array  output  multiplied  by  the  adaptive  weights  necessaiy  to  steer  the  mam  beam 
towards  the  desired  signal.  The  diffracted  component  is  separated  from  the  term  from  the  Bragg  cell  which  helped 
write  the  grating  by  using  angle  multiplexing  in  the  direction  of  Bragg  degeneracy!?]. 

Beam  formation  results  are  shown  in  figure  2.  Figure  2a  shows  the  frequency  spectrum  of  the  output  of  the  processor 
after  steering  the  main  beam  towards  the  desired  signal,  and  figure  2b  depicts  the  received  radar  scenano.  As  shown  m 
figure  2b,  there  is  a  broadband  signal  of  interest  (4  MHz  sweep)  and  a  strong  narrowband  jammer  (76.8  MHz)  at  a 
different  AOA  The  beam  forming  processor  forms  an  antenna  array  function  centered  on  the  broadband  signal  or 
interest  while  the  jammer  AOA  falls  on  (in  this  case)  the  first  antenna  sidelobe.  After  weighting  by  the  array 
function  the  signals  can  be  projected  onto  the  frequency  axis  as  shown  in  2b,  which  corresponds  to  the  spectrum 
analvzer  trace  in  2a.  The  measured  array  function  is  shown  in  2c.  It  is  important  to  note  that  the  spatial  processing 
achieved  using  the  temporal  correlation  property  of  the  desired  signal  fonns  an  array  function  pointed  toward  the 
desired  signal  which  reduces  the  jammer  power  because  it  is  arriving  on  a  sidelobe  (a  reduction  of  13  dB  in  this  case). 

3  TAMMRR-NTIT.T.TNG  PROCESSOR 

The  iammer-nulling  processor  computes  the  AOAs  of  multiple  interfering  narrowband  radar  j^inmers  and  adaptively 
steers  nulls  in  the  antenna  pattern  in  order  to  extinguish  the  jammers  by  implementing  a  modified  least-mean-squared 
(LMSirSl  algorithm  in  the  optical  domain.  The  jammer-nulling  processor  is  shown  schematically  m  the  lower  right 
of  figure  1  The  detected  main  beam  signal  is  sent  through  a  delay  and  applied  to  the  feedback  acousto-optic  Bragg 
cell  The  diffracted  signal  from  the  AOD  is  incident  upon  the  PRC  as  is  the  optical  signal  from  the  phased-array. 
Narrowband  signal  components  from  the  phased-array  will  be  well  correlated  with  the  jammer  components  of  the 
delayed  and  fedback  version  of  the  main  beam  signal  diffracted  from  the  Bragg  cell  and  will  produce  a  stationary 
interference  pattern  at  the  PRC  while  the  broadband  signals  will  not.  This  stationary  interference  pattern  will  begin 
forming  a  voWe  holographic  grating  in  the  crystal  which  is  Bragg  matched  for  a  particular  frequency  jammer  at  a 
particular  AOA.  As  the  grating  forms  a  portion  of  the  jammer  in  the  phased-array  output  is  diffracted  off  of  the 
grating  and  heterodyne  detected  forming  a  jammer  estimate.  The  jammer  estimate  is  amplified  pd  subtracted  from  the 
main  beam  signal  producing  a  processor  output  with  reduced  jammer  content.  The  feedback  signal  now  contains  less 
jammer  content  and  the  grating  builds  up  more  slowly.  The  system  converges  when  the  jammer  content  has  been 
reduced  by  the  net  gain  around  the  feedback  loop.  Broadband  signals  of  interest  are  decorrelated  by  the  finite  delay 
around  the  loop  and  therefore  do  not  write  gratings  in  the  PRC  and  therefore  they  are  not  nulled. 

The  dynamic  behavior  of  the  jammer  excision  has  been  modeled  using  an  equation  describing  temporal  evolution  of 
the  jammer  signal  around  the  feedback  loop[3,6].  For  a  single  plane  wave  jammer  the  temporal  behavior  of  the 
complex  valued  excision  can  be  described  by  the  normalized  nonlinear  dynamical  equation 


g^E  =  a-[a  +  +  \Ef  - E\Et 


(1) 
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with  =  a  =  l/(/c4^Ti'5R')  b  =  gR^pl[a^I^x\^  (2) 

where  /^,  ,  and  /^are  the  intensities  of  the  phased- array  beam,  main  beam  optical  heterodyne  reference  signal, 

and  the  optical  input  to  the  feedback  Bragg  cell  respectively.  Si  is  the  responsivity  of  the  photodetector,  r\  is  the 
efficiency  of  the  Bragg  cell,  a  and  p  are  the  write  and  erasure  proportionality  constants  of  the  PRC,  g  is  the 
electronic  gain  around  the  feedback  loop,  and  <7^^  is  the  unwanted  delay  due  to  the  feedback  acousto-optic  device 
transducer.  Under  th^  assumption  that  the  excision  remains  small  and  near  the  center  frequency 
i.o.-7tl2<(7^j\co^.  -C0jj<Kl2,  (1)  can  be  linearized  to  obtain  an  analytical  solution  for  describing  jammer 
characteristics  such  as  convergence  time  and  steady-state  suppression  depth  for  both  single  and  multiple  jammers. 
Allowing  for  incident  jammers  with  arbitrary  spatial  profile,  the  vector-modal  solution  of  the  excision  Ej  for  the  jih 
jammer  with  amplitude  Aj  is  given  by [6] 

^  (3) 

where  we  have  defined  the  system  center  frequency  as  co^.  and  the  total  interference  power  as  .  The  initial  decay 
rate  and  the  steady- state  suppression  level  are  given  by 

j^Ej  (4)  and  =a{A^-^l[a  +  (A]- (5) 

The  results  obtained  from  the  analytical  solutions  are  analogous  to  those  expected  from  any  LMS  type  algorithm. 
For  example,  from  (5)  it  is  found  that  a  stronger  jammer  is  suppressed  more  rapidly  and  to  a  larger  extent  than  a 
weaker  jammer.  Various  multiple  jammer  scenarios  have  been  demonstrated  experimentally  and  are  reported 
elsewhere[6,7],  and  typical  experimental  single  jammer  suppression  levels  are  currently  approximately  30dB[2,3]. 


4.  SIMULTANEOUS  BEAM-FORMING  AND  JAMMER-NULLING  RESULTS 


Results  demonstrating  simultaneous  beam  formation  and  jammer  suppression  are  shown  in  figure  4.  Figure  4a  is  the 
frequency  spectrum  of  the  processor  output  shown  in  3a  after  additional  jammer  nulling.  After  nulling  the  jammer  is 
suppressed  by  an  additional  20  dB  from  the  13  dB  due  to  the  fact  that  it  arrived  on  an  antenna  sidelobe,  as  depicted  in 


(a) 


(b) 


Figure  3.  (a)  Frequency  spectrum  of  processor  output  showing  broadband  signal  and  jammer  on  antenna  sidelobe  after 


jammer  nulling  demonstrating  additional  20  dB  of  suppression,  (b)  Scenario  depicted  in  AO  A/frequency  space. 


5.  CONCLUSIONS 

We  have  designed,  analyzed  and  experimentally  demonstrated  a  large  number  of  DOF  photorefractive  optical 
processor  for  very  large  phased- array  antennas.  The  processor  adaptively  forms  an  antenna  array  function  and  steers 
the  main  lobe  in  the  direction  of  a  desired  signal  of  interest,  then  adaptively  rotates  the  nulls  of  the  antenna  function 
to  suppress  narrowband  jammers  incident  on  antenna  sidelobes. 
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1.  Introduction 

The  bandwidth  and  the  efficiency  of  fiber  optic  communication  systems  exceed  these  of 
electrical  cable  systems.  However,  presently,  we  are  far  from  realizing  the  potential  performance 
of  optical  networks.  Electronic  devices  and  systems  connected  to  optical  networks  may  reach 
bit-rates  on  the  order  of  1  Gb/s.  In  contrast,  the  maximum  bit-rate  of  a  photonic  network  may 
exceed  1  Tb/s,  limited  by  the  performance  of  the  optical  fiber.  The  three  order-of-magnitude 
mismatch  between  fiber  and  device  capacity  can  be  used  to  increase  the  speed,  security ,  and 
reliability  in  the  data  transmission.  Several  all-optical  methods  exploiting  this  bit-rate  mismatoh 
are  being  investigated  for  controlling  data  streams  in  communication  channels  to  utilize  this 
bandwidth  more  efficiently.  These  approaches  may  include  mutual  conversion  of  the  space-to- 
time,  space-to-frequency,  spatial  frequency-to-time  and  spatial  frequency-to-temporal  frequency. 
The  possibility  of  converting  optical  image  or  image-like  parallel  data  into  the  optical  fiber  has 
been  demonstrated  by  using  a  pair  of  moving  gratings  to  introduce  spatial-to- temporal  encoding! . 
In  this  manuscript  we  introduce  a  holographic  method  that  allows  parallel-to-serial  (i.e.,  space-to- 
time)  optical  signal  conversion  by  encoding  spatial  frequency  spectrum  of  the  parallel  optical 
signals  onto  the  temporal  frequency  spectrum  of  optical  pulses.  Moreover,  by  combining  our 
technique  with  existing  serial-to-parallel  conversion  methods^’^  we  demonstrate  the  possibility  of 
transmitting  parallel  optical  signals  over  long  distance  optical  fiber  network. 

2.  Description  of  the  parallel-to-serial  and  serial-to-parallel  optical  processors 

Our  approach  for  parallel-to-serial  optical  data  conversion  is  based  on  combining  optical 
information  processing  that  uses  spectral  holography!’^  with  that  of  conventional  spatial  Fourier 
transform  holography.  The  all-optical  parallel-to-serial  conversion  processor  is  shown 
schematically  in  Fig.  la.  The  processor  consists  of  two  independent  optical  channels  for  carrying 
the  temporal  and  the  spatial  information.  The  temporal  information  carrying  channel  consists  of 
a  pair  of  gratings  and  a  4-F  lens  arrangement.  The  incident  pulses  are  transformed  by  the  input 
grating  and  the  first  lens  into  temporal  frequency  spectrum  distribution  in  space  of  the  focal 
plane,  while  the  second  lens  and  the  output  grating  are  performing  the  inverse  transformation  of 
the  temporal  spectrum  distribution  back  to  the  time  domain.  The  spatial  information  ca^ying 
channel  is  a  simple  optical  spatial  Fourier  transform  arrangement  consisting  of  the  input  image 
plane  and  a  beamsplitter  to  share  the  second  lens  of  the  temporal  channel.  To  achieve  interaetton 
between  the  temporal  and  spatial  frequencies  spectrum  information  we  use  a  real  time 
holographic  material  in  a  four- wave  mixing  arrangement.  For  our  initial  experiment  we  used  a  1 
mm  thick  photorefractive  crystal  of  LiNbOs.  A  1-D  binary  input  image  (or  a  1-D  spatial  light 
modulator)  is  illuminated  by  a  monochromatic  optical  source,  Fourier  transformed  into  the  plane 
of  the  real-time  holographic  material  where  it  interferes  with  the  plane  reference  wave.  The 
interference  pattern  via  the  photorefractive  effect  causes  recording  of  a  spatial  Fourier  transforrn 
hologram.  The  recorded  spatial  Fourier  transform  hologram  is  reconstructed  by  the  temporal 
frequency  spectrum  of  a  femtosecond  pulse  with  a  center  wavelength  close  to  that  of  the 
monochromatic  source  used  for  the  recording  process.  Note,  that  the  temporal  frequency 
spectrum  is  spatially  distributed  along  the  transverse  coordinate  of  the  hologram  plane. 
Uierefore,  the  diffracted  temporal  frequency  spectrum  is  modulated  by  the  spatial  frequency 
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spectrum  of  the  hologram.  Upon  transmission  through  the  second  lens  and  the  output  grating,  the 
diffracted  temporal  frequency  spectrum  results  in  a  sequence  of  short  pulses  which  exhibit  one- 
to-one  correspondence  with  the  1-D  spatial  distribution  in  the  input  image.  Note,  that  the 
resultant  sequence  of  temporal  pulses  is  carried  by  a  single  beam  which  can  be  easily  coupled 
into  an  optical  fiber  link.  For  decoding  of  the  temporal  information  at  the  receiver  node  we  also 
need  to  transmit  a  single  reference  pulse. 

At  the  receiver  node  we  need  to  perform  an  inverse  serial-to-parallel  transformation. 
Such  a  transformation  can  be  utilized  with  spectral  holography  of  the  sequence  of  temporal 
pulses  and  a  reference  pulse  as  shown  schematically  in  Fig.  Ib^-^.  The  recorded  spectral 
hologram  is  reconstmcted  using  a  monochromatic  plane  wave  resulting  in  a  diffracted  wave  that 
is  modulated  by  the  spatial  frequencies  of  the  spectral  hologram.  Upon  transmission  through  the 
spatial  Fourier  transform  lens,  the  diffracted  wave  results  in  a  1-D  image  which  exhibit  one-to- 
one  correspondence  with  the  sequence  of  the  incident  short  pulses.  Therefore,  transmission  of 
images  and  image-format  data  can  be  achieved. 

3.  Experimental  results 

In  the  experiments  we  used  150  fsec  optical  pulses  at  a  wavelength  of  480nm,  generated 
from  a  mode-locked  TirSapphire  laser  and  a  frequency-doubling  BBO  crystal.  To  satisfy  Bragg 
matching  conditions  required  by  volume  holography  in  a  1  mm  thick  LiNbOa  photorefractive 
crystal,  we  used  a  wavelength  of  488nm  line  from  a  monochromatic  CW  argon  laser.  During 
these  experiments  the  output  pulses  from  the  system  shown  in  Fig.  la  were  transmitted  directly 
to  the  input  of  the  system  shown  in  Fig.  lb.  In  order  to  assure  that  there  was  no  spatial 
information  carried  by  the  transmitted  signal  pulses,  spatial  filtering  was  performed  to  eliminate 
higher  spatial  frequencies.  Alternatively,  the  output  and  the  reference  pulses  can  be  transmitted 
through  two  identical  optical  fibers  or  through  a  single  fiber  using  polarization  multiplexing.  A 
1-D  binary  input  image  (see  Fig.  2a)  was  used  in  our  experiments  for  parallel-to-serial  and  serial- 
to-parallel  conversion  (see  Fig.  2b)  employing  the  processors  shown  in  Fig.  la  and  lb, 
respectively.  The  transmitted  image  in  Fig.  2b  shows  exact  correspondence  to  the  original  image 
in  Fig.  2a. 

4.  Conclusions 

In  conclusion,  we  have  introduced  and  experimentally  demonstrated  optical  processors 
that  perform  parallel-to-serial  and  serial-to-parallel  data  conversion  for  1-D  images  and  image- 
format  data  transmission.  This  approach  is  suitable  for  long  distance  communication  of  parallel 
information  through  all  optical  fi^r  networks.  In  the  future  we  are  planning  to  use  fast  nonlinear 
optical  materials  such  as  photorefractive  semiconductor  crystals  and  semiconductor 
microstructures  to  provide  high  speed  operation. 
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Pulse  beam 
CW  beam 


Fig.  1  Schematic  diagram  of  optical  processors  for  (a)  parallel-to-serial  conversion  and 
(b)  serial-to-paralle  conversion 


(a)  (b) 


Fig.  2  Experimental  result  of  image  transmission  using  parallel-to-serial  and  serial-to-parallel 
coversion;  (a)  the  original  1-D  image,  (b)  the  recieved  1-D  image. 


0WB4-1  /  239 


The  Fractional  Fourier  Transfonn  in  Optics: 

Do  we  need  it?  Is  it  useful? 

Adolf  W.  Lohmann,  Weizmaim  Institute  of  Science, 

Dept,  of  Physics  of  Complex  Systems, 

Rehovot  76100,  Israel,  Tel.  972-8-342051,  Fax.  4109. 

David  Mendlovic,  Tel  Aviv  University,  Fac.  of  Engineering, 

Tel  Aviv  69978,  Israel. 

Haldun  M.  Ozaktas,  Bilkent  Universily,  Electr.  Eng.  Dept., 

Bilkent  06533,  Ankara,  Turkey 

MOTTVATTON 

Several  respected  colleges  and  anonymous  reviewers  asked  us:  is  the  fiactional  Fourier 
transform  (FRT)  more  than  a  modified  Fresnel  transform  (FRS)?  Our  answer  is  yes, 
the  FRT  is  in  our  opinion  a  worthy  addition  to  the  class  of  transformations  in  optics. 
To  understand  our  belief  it  might  be  useful  to  re  call,  why  we  invented  the  optical 
FRT,  actually  twice,  at  first  in  the  context  of  GRIN  fiber  optics  (1)  and  then  as  a  linear 
centerpiece  to  sketch  how  the  FRT  could  have  been  invented  as  a  special  case  of  A.  E. 
siegman's  integral  transform,  or  as  a  special  case  of  J.  Shamir's  operator  optics.  Those 
two  authors  could  have  invented  easily  the  FRT,  if  the  need  to  do  so  had  arisen.  We 
mention  those  two  almost-inventions  in  order  to  present  family  features  of  various 
optical  transforms. '  Furthermore,  this  sideline  of  our  arguments  is  usefiil  as  preparation 
for  answering  the  question;  which  one  of  all  those  transforms  is  most  fundamental?  We 
will  propose  four  criteria  for  measuring  the  fundamentality.  In  our  opinion,  all  four 
criteria  are  subjective  in  nature.  In  other  words,  a  statement  like:  transfonn  A  is  merely 
a  modification  of  transform  B,  has  no  universal  validity. 

What  counts  is,  if  the  FRT  is  useful  for  something.  For  us  that  has  been  the  case  at 
eight  occasions,  where  the  FRT  based  approach  lead  to  more  insight  and  to  some 
inventions.  This  statement  is  admittedly  subjective  since  every  insight  and  invention 
could  have  been  made  without  the  help  of  the  FRT.  But  those  events  did  occur  with 
the  help  of  the  FRT.  We  will  briefly  mention  those  events. 

The  GRIN  Approach  to  the  FRT 


Mathematics  become  much  more  useful,  when  integer  numbers  were  broken  up  into 
real  numbers.  Hence,  the  two  junior  authors  looked  for  some  mathematical 
procedures,  that  are  relevant  for  optics  and  which  characterized  by  an  integer  number. 
The  Fourier  transform,  which  is  often  applied  to  diffraction  and  to  imaging,  is  used 
once,  or  twice  in  cascade.  Apparently,  there  occurs  an  index  with  integers  either  one  or 
two.  Hence,  let  us  break  the  Fourier  transform  into  parts,  they  said.  The  breaking  into 
pieces  can  be  done  literally,  since  the  classical  Fourier  transform  (FOU)  is  executed 
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opticaUy  by  a  piece  of  GBJN  fiber  of  proper  length  Li.  Reducing  that  length  to  PLi  (P 
is  real  number)  provides  a  tool  for  performing  the  lUT  with  index  P. 


The  Wign&r  Approach  to  the  ERX 


The  GRIN  approach  will  put  the  Gauss-  Legendre  polynomials  at  the  center  spot  of 
the  theory,  since  this  polynomials  are  directly  related  to  the  eigenmodes  of  GRIN 
fibers.  The  senior  authors  prefers  plane  waves,  the  eigen  modes  of  firee  space 
propagation,  as  elementary  waves.  Being  used  to  visualize  optical  phenomena  m 
Wigner  space,  he  realized  that,  whatever  the  FRT  does  to  a  signal,  is  equivalent  to  a 
rotation  in  Wigner  space.  That  may  seem  quite  abstract,  but  it  is  mathematically 
nothing  more  than  a  replacement  of  the  Wigner  coordinates,  such  as: 

(x,y)“>  (xcos  A-ysin  A,  ycos  A  +xsin  A). 

All  three  authors  found  soon  that  both  approaches  are  strictly  equivalent. 


The  conceivable  AES  approach 


A.  E,  Sigma  (3)  and  others  as  well  presented  a  universal  linear  integral  transform  with 
an  eiTicment  of  the  form:  27ri(ax'"2+by'^2-2Txy).  This  "mother  transform"  coniairis 
Fourier  if  (a=0,  b=0,T='l)  and  Fresnel  if  (a=b=t)  as  special  cases.  The  FRT  emerges  if 
a=b  and  a/T-cos(PTc/2).  Hence,  the  FRT  is  simply  another  special  case  of  Siegman's 
transform,  which  is  also  true  of  FOU  and  FRS.  Siegman's  categorization  of  linear 
transforms  in  optics  deserves  the  attribute  top  down. 


The  conceivable  JS  approach 

J.  Shamir  strategy  (4)  deserves  bottom  up  as  attribute.  He  defines  elementary 
operators  such  as  lens,  fi-ee  space,  FOU,  magnification  and  so  on.  Already  two  of  those 
elementary  operators,  notably  lens  and  FSP,  are  sufficient  for  synthesising  all  other 
operators,  including  FOU,  FRS.  FRT  and  AES.  (FSP  and  FRS  are  identical  in 
paraxial  approximation.)  A  few  examples  are; 

AES=LENS  FOU  LENS 
AES^LENS  FRS  LENS 
AES=LENS  FRT  LENS 
FRT=LENS  FRS=FRS  LENS 
FRT=LENS  AES=AES  LENS 

Certain  parameters,  like  distance  and  focal  length,  are  neglected  in  those  symbolic 
shorthand  statements. 

How  fundamental  is  the  FRT? 

"Fundamental"  can  mean  different  things  to  different  people.  For  example  historical 
sequence  could  be  applied  to  measure  the  "degree  of  fimdamentality"  of  a  certain 
approach.  Good  for  Huygens.-  another  criterion  could  be  teaching  sequence.  That 
would  place  Wigner  far  behind  Fresnel.  But  one  may  ask:  are  light  rays  more 
fundamental  than  waves,  simply  because  ray  optics  is  usually  taught  before  wave 
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optics?  If  basic  effects  are  more  fundamental  than  complete  optical  transform  setups, 
then  FRS  and  lens  are  as  fimdamental  as  Adam  and  Eve,  If  the  degree  of  convenience 
is  taken  as  measure  of  fiindamentality,  then  AES  would  pick  probably  the  AES 
transform,  JS  the  operator  formalism  and  the  senior  author  most  often  the  Wigner 
approach,  on  the  FRT. 

It  was  perhaps  an  interesting  excercise  to  speculate,  which  approach  to  wave  optics  is 
most  fundamental.  But  what  counts,  when  discussing  the  prime  question  of  this  study: 
"is  there  a  place  for  the  FRT?",  is  ultimatly. 

The  usefulness  of  the  FRT 

.Roughly  two  dozen  papers  related  to  the  FRT  did  appear  so  far.  What  are  the 
benefith?  New  understandingof  GRIN  fiber  optics 
Significance  of  Wigner  rotation 
Chirp  noise  suppression 
Space-  variant  correlation 
Radon  transform  of  Wigner 
New  version  f  resonator  theory 
Simplified  design  of  lenses  in  cascade 
A  simple  zoom  lens,  called  "fake  zoom" 

We  are  fully  aware,  that  today,  the  FRT  cannot  satisfy  every  demand.  But  remember, 
the  FRT  is  still  very  young. 

References: 

(1)  H.  M.  Ozaktas,  D.  Mendlovic,  J,  Opt.  Soc.  Am,  AlO  (1993)  1875. 

(2)  A.  W.  Lohraann,  J.  Opt,  Soc.  Am.  AlO  (1993)  2181. 

(3)  A.  E.  siegman,  "Lasers",  Ch.  20.6,  p.805. 

(4)  M.  Nazarathy,  I  Shamir,  J.  Opt,  Soc.  Am.  72  (1982)  1398-1408. 


242  /  0WB5-1 


Optical  Wavelet  Processor  for  Target  Detection 
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Cahfornia  Institute  of  Technology 
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Wavelet  transform  has  been  widely  applied  to  time-frequency  signal  analysis,  image 
processing  (enhancement,  feature  extraction,  etc.),  and  t^get  detection.  Since  wavelet 
ttansform  is  a  convolution  process  between  an  input  and  a  large  number  of  wavelet  bases, 
the  computation  load  increases  nonlinearly  with  the  sizes  of  the  input  and  the  wavelet.  Near 
real-time  optical  wavelet  transform  [1-3]  could  be  accomplished  by  usmg  an  op^^al 
correlator  architecture.  The  processing  speed  of  optical  wavelet  transfom  is  independent  ot 
the  size  of  the  wavelet  filter  and  is  only  limited  to  the  updating  speed  of  the  spatial  light 
modulator. 

In  this  paper,  we  describe  two  innovative  techniques  for  optical 

wavelets  using  liquid  crystal  television  spatial  light  modulators  SLMsti  a  2-D 

Morlet  wavelet  and  a  ternary-valued,  shape-discnmmant  wavelet  LClVbLMs.  ihez-u 

Morlet  wavelet  is  synthesized  using  two  SLM's  for  continuous  amplitude  and  binary  phase 
modulation.  The  ternary  wavelet  is  synthesized  using  only  a  single  SLM.  These  wavelet 
filters  have  also  been  inserted  into  an  optical  correlator  and  demonstrated  for  target 
detection  with  improved  discrimination  over  that  of  the  conventional  correlation  using  a 
matched  filter. 


In  a  previous  paper  [4],  we  have  developed  a  2-D  modified  Morlet  wavelet  filter  and 
demonstrated  its  feature  extraction  ability  for  target  detection. 


The  2-D  Morlet  wavelet  [3-5]  can  be  written  in  terms  of  angular  orientation  9  as 

«  exp{i^o  • 

This  2-D  Morlet  wavelet  consists  of  amplitude  and  a  phase  components.  In  order  to 
optically  synthesize  the  Morlet  wavelet,  a  SLM  capable  of  modulating  both  the  amplitude 
and  phase  information  is  required.  We  have  developed  a  technique  capable  of  controlling 
both  the  amplitude  and  phase  modulations  independently  using  two  LCTV  SLMs. 

As  shown  in  Fig.  1,  two  LCTV  SLMs  are  cascaded  to  implement  the  Morlet  wavelet.  The 
first  LCTV  SLM  is  used  to  generate  the  linear  phase  modulation  and  the  second  SLM  is 
used  as  an  amplitude  modulator  to  generate  the  continuous  Gaussian  envelope.  To  ease 
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experimental  implementation,  the  linear  phase  modulation  can  be  simplified  into  binary 
phase  such  that  it  could  be  directly  written  into  the  LCTV  SLM. 


Polarizer  1  Polarizer  2  Polarizers 


LCTV  SLM  LCTV  SLM 


(Binary  Phase  Mode)  (Amplitude  Mode) 


Fig.  1 .  2-D  Morlet  wavelet  optical  synthesis  using  two  cascaded  LCTV  SLMs. 

Ternary-valued  Shape-discriminant  Optical  Wavelet  Synthesis 

To  further  improve  the  shape  discrimination  capability  of  a  wavelet  filter,  we  have  also 
developed  a  ternary-valued  shape- specific  wavelet.  For  an  arbitrarily  shaped  geometric 
object  C(x,y) ,  a  corresponding  shape-specific  wavelet  G{x,y)  can  be  written  as; 


then 


G(x,y)  =  |^2g(x,y)-g 


IV2’V2 


G{x,y)dxdy  =  0 


(2) 

(3) 


Although  the  G(x,y)  does  not  satisfy  all  the  mathematical  definitions  of  a  wavelet,  it  does 
satisfy  the  admissibility  condition.  This  zero-mean  characteristics  makes  it  particularly 
useful  for  target  discrimination  applications.  As  shown  in  equations  (2)  and  (3),  a  positive 
real  binary  input  object  could  be  converted  to  a  corresponding  binary  bipolar  wavelet.  To 
optically  synthesize  this  wavelet  function,  the  background  should  be  opaque.  Thus,  a 
ternary-valued  synthesis  technique  is  highly  desirable.  As  shown  in  Figure  2.  We  have 
developed  such  a  ternary  wavelet,  consisting  of  values  of  ±1  and  0,  using  a  single  LCTV 
SLM  [5,6]. 


Fig.  2.  The  polarization  orientation  of  the  three  gray  levels  (0,  73,  255)  used  to  drive  an 
Epson  LCTV  SLM  to  obtain  the  corresponding  output  ternary  states  (+1,  -1,  and  0). 
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Figure  2  shows  three  input  gray  levels  0,  73  and  255  are  selected  to  provide  the  desired  0 
or  %  phase,  and  the  dark  state  modulation,  respectively.  During  the  experiment,  the 
analyzer  is  oriented  90  degrees  from  the  output  polarization  state  corresponding  to  the 
input  255  gray  level.  A  triangular  shaped  ternary  wavelet  is  generated  using  the  sitings 
described  above.  The  wavelet,  and  the  corresponding  power  spectrum  are  shown  in  Figure 

3a  and  3b. 


Fig.  3.  Synthesis  of  ternary  shape- specific  wavelet  filter,  (a)  Ternary  triangular¬ 

shaped  wavelet  (white,  black,  gray  regions  possess  +1,  -1,  and  0  values, 

(b)  The  corresponding  power  spectrum  of  the  ternary  tnangular-shaped  wavelet  (the  dark 
center  shows  the  zero-mean  characteristics  of  this  wavelet),  (c)  Temarized  version  of  (b). 


The  spatial  wavelet  of  3(a)  could  be  used  to  generate  holographic  Founer  filters  to  perfom 
optical  wavelet  transform  in  a  holographic  optical  correlator.  The  ternary  Fourier  wavelet 
filter  shown  in  3(c)  could  also  be  directly  downloaded  into  a  Founer  plane  SLM  tor  real¬ 
time  wavelet  transform,  using  the  setup  shown  in  Figure  4.  The  half  wave  plate  shown  in 
this  setup  is  used  to  align  the  polarization  to  the  molecular  director  of  the  second  SLM. 


We  have  experimentally  demonstrated  the  shape  discrimination  capability  of  the  ternary¬ 
valued  wavelet  filter  using  the  setup  of  Figure  4.  As  shown  in  Figine  5(a),  an  input 
consisting  of  two  triangular  and  two  semi-elliptical  objects  are  used  as  inputs.  Due  to  the 
similarity  in  shape  between  the  triangular  and  semi -elliptical  objects,  convention^  inn^- 
product  template  matching  would  not  be  able  to  discriminate  the  two  types  of  objects.  In 
our  experiment,  a  triangular-shaped  wavelet  filter  was  prepared  and  downloaded  into  the 
Fourier  plane  LCTV  SLM.  The  outputs  without  and  with  thresholding  are  shown  in 
Figures  5(b)  and  5(c),  respectively.  The  thresholded  output  demonstrates  the  superior 
shape  discrimination  capability  of  the  wavelet  filter. 


This  type  of  shape-discriminant  wavelet  filter  has  also  been  successfully  demonstrated  for 
mine  (circular-shaped)  detection,  and  car  license  plate  (rectangular  shaped)  detection. 


The  research  described  in  this  paper  was  performed  by  the  Center  for  Space 
Microelectronics  Technology,  Jet  Propulsion  Laboratory,  California  Institute  of 
Technology,  and  was  sponsored  by  the  Ballistic  Missile  Defense  Organization/innovative 
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Science  and  Technology  Office,  through  an  agreement  with  the  National  Aeronautics  and 
Space  Administration. 
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SLMl  Half  Wave  SLM2 

(Input)  Plate  (Filter) 


Polarizer  1  Polarizer  2  Polarizer  3  detector 

Figure  4.  An  Optical  Wavelet  Processor  using  a  Fourier  ternary  wavelet  filter. 


(a)  (b)  (c) 

Figure  5.  Experimental  demonstration  of  target  detection  using  a  real-time  optical  wavelet 
processor,  (a)  Input  of  two  cone-shaped  and  two-semi-elliptical-shaped  objects;  (b)  non- 
thresholded  wavelet  transformed  output;  and  (c)  thresholded  output. 
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Future  Directions  in  ’’Smart"  Quantum  Well  Spatial  Light  Modulators 

and  Processing  Arrays 

David  A.  B.  Miller 
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USA 


Quantum  well  modulators  and  photodetectors  are  one  attractive  option  for  large  scale 
integration  of  arrays  of  optical  inputs  and  outputs  in  information  processing  systems. 
Optics  is  fundamentally  attractive  because  it  offers  basic  physical  advantages  in 
interconnections,  and  may  allow  novel  architectures  of  information  processing  systems 
not  well-suited  to  electronics  alone.  In  the  past,  large  arrays  of  quantum  well  devices  have 
been  used  in  experimental  systems,  and  more  recently  technologies  have  emerged  that 
have  allowed  "smart"  arrays  incorporating  electronics  both  for  added  functional 
complexity  and  reduced  optical  energy  requirements.  The  FET-SEED  technology,  for 
example,  has  integrated  GaAs  field  effect  transistors  with  quantum  well  modulators  and 
detectors  for  high  speed  circuits  with  sophisticated  functions,  and  has  already  been  used 
to  fabricate  multi-project  wafers  for  experimental  use  by  a  broad  range  of  users. 

Very  recently,  hybrid  integration  of  quantum  well  devices  with  complex,  mainstream, 
silicon  circuits  has  become  a  practical  reality.  This  advance  opens  up  a  broad  range  of 
new  possibilities  for  systems.  There  is  a  serious  prospect  of  large  complex  "smart" 
circuits,  made  from  the  most  capable  silicon  circuits,  and  operating  as  fast  as  the  silicon 
circuits  themselves  can  run,  but  unconstrained  by  the  usual  difficulties  of  electrical 
interconnects.  This  possibility  also  raises  challenges,  at  the  optoelectronic  device  and 
electronic  circuit  level;  for  example,  receiver  circuits  should  occupy  small  areas  and 
consume  low  powers  so  they  can  be  made  in  large  arrays,  but  they  must  at  the  same  time 
be  sensitive  with  low  error  rates  with  good  immunity  to  the  effects  of  neighboring 
circuits.  The  challenges  and  opportunities  for  research  at  the  systems  level  are  perhaps 
even  greater.  Such  technologies  raise  the  prospect  of  optical  systems  with  functional 
complexity  well  beyond  previous  bounds  and  electronic  systems  with  scales  and 
topologies  of  interconnections  also  outside  most  previous  experience.  To  give  a  sense  of 
where  such  technologies  could  be,  the  talk  will  address  a  skeleton  roadmap  for  how  these 
technologies  could  progress  in  years  to  come. 
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Device-Architecture  Interaction  in  Optical  Computing 


Ravindra  A.  Athale 
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Device  technologies  and  processor  architectures  exert  a  strong  influence  on  each  other  in 
optical  computing.  I  will  discuss  examples  of  successful  and  unsuccessful  interactions 
between  these  two  communities.  The  role  of  the  CO-OP  in  enhancing  this  interaction  will  be 
outlined. 
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the  history  of  optical  computing  : 

A  PERSONAL  PERSPECTIVE 
Adolf  W.  Lohmann 

Michael  visiting  professor,  Weizmann  Institute  of  Science,  Dept,  of  Physics  of 
Complex  Systems,  Rehovot  76100,  Israel. 

Permanently  :  Universitat,  Phyeikalisches  Institut,  Rommel  Str.  1,  91058  Erlangen, 
Gtermany.  ( FAX  49  -  9131  - 15249). 

For  me  the  history  of  optical  computing  can  be  divided  into  several  phases,  which 
I  will  illustrate  by  examples.  The  phase  transitions  mark  changes  of  my  personal 
attitude  towards  optics  in  general  and  to  information  optics  in  particular. 


PHASE  1 :  From  the  Greeks  to  1950. 

Optics  had  been  taught  to  me  as  a  collection  of  phenomena,  some  of  them  nice  -  like 
the  rainbow  - ,  others  not  so  nice,  like  the  use  of  optics  as  a  weapon  by  Archimedes.  I 
tried  to  find  intellectual  structures  behind  the  collection  of  phenomena.  But  my 
attempt  [  1 1  was  deemed  insufficient  for  a  Masters  thesis.  In  my  second  attempt  I 
was  forced  to  develop  hardware,  a  two  -  layer  lithographic  grating  structure  [  2  ].  It 
was  a  valuable  experience. 

PHASE  2  :  Analog  Processing  (1950  -  60,). 

Gabor’s  holography  was  exciting.  I  tried  to  suppress  the  twin  image  by  single 
sideband  holography.  The  success  was  very  modest  only.  But  I  learned  to  look  at 
optics  as  a  means  for  signal  and  image  processing  [31. 

PHASE  3  :  The  signal  amplitude  changed  gradually  from  analog  to  quantised  and 
binary.  The  continuous  space  variable  was  fractured,  or  pixellated.  Optical  logic 
occurred  in  Theta  Modulation  [  4  ].  What  is  now  called  wavelength  division 
multiplexing  was  called  30  years  ago  lambda  super-resolution.  Hybrid  mixtures  of 
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digital  and  optical  technologies  became  interesting,  for  example  as  computer 
generated  holograms.  The  optical  implementation  of  residue  number  algebra  was 
tried  in  the  FSU.  That  phase  (1960  -  80)  could  be  called  perhaps  “From  analog  to 
digital”. 

PHASE  4  :  Now,  during  1980  to  85,  the  optical  community  was  courageous  as 
seldom  before,  and  never  since.  The  digital  optical  computer  was  proclaimed  as  a 
goal.  Parallelism  was  the  magic  term.  The  perfect  shuffle  was  supposed  to 
contribute  to  this  movement.  The  relationship  with  the  electronic  community  looked 
like  the  confrontation  between  David  and  Goliath.  The  experience  was  not  always 
pleasant.  But  it  was  instructive.  Gradually,  a  transition  occurred  into : 

PHASE  5  !  “Optics  FOR  Computing”,  or  “Optics  WITHIN  the  Computer”. 

That  required  the  matching  of  technologies.  Terms  like  optical  packaging,  from 
macro-optics  to  micro-optics  became  prominent.  My  own  advice  was  -  and  still  is  -  to 
use  the  existing  classical  macro-optics  wherever  sensible.  You  get  high  quality  for  a 
low  price.  I  do  not  mean  to  use  classical  macro-optics  exclusively.  That  would  be  as 
foolish  as  the  opposite  :  to  use  only  micro-optical  arrays.  Be  tolerant  and  try  hybrid 
macro-micro  approaches. 

PHASE  6  s  (1995  -  2006) 

Computation  and  communication  will  intermingle  more  and  more.  The  former  will 
remain  dominated  by  electronics.  After  all,  a  silicon  transistor  costs  less  than  a 
micro  $,  Optics  will  progress  as  it  has  done  already  in  the  pure  communications 
technology,  from  large  distances  to  shorter  distances. 

We  should  not  look  at  the  future  as  a  battlefield  of  technology  replacements. 
Instead  we  should  aim  at  HYBRID  solutions,  as  it  happened  in  the  world  of 
transportation.  We  have  cars  and  trains  which  move  on  a  2D  surface,  just  like 
electronic  signals.  The  cars  are  self-muting,  the  trains  under  central  control.  And  we 
have  air  traffic  in  3D,  with  considerable  topological  advantages.  Within  the  air  traffic 
business  there  are  two  trends  :  to  micro-planes  (a  mini-helicopter  for  everyone),  and 
to  macro-planes  (1000  passengers,  travelling  from  one  conference  to  the  next 
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conference),  Again,  tolerance  is  asked  for. 

The  overall  transportation  system  is  not  perfect,  but  it  does  function,  largely 

because  different  technologies  collaborate  in  a  fairly  sensible  way.  It  does  not  require 
much  fantasy  to  imagine  how  much  worse  the  situation  could  be.  Let  us  learn  our 

lesson. 
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Acoustic  processing  of  audio  and  sonar  by  animals  involves  the  temporal  as  well  as  spatial  aspects 
of  an  incoming  signal.  We  can  presume  that  a  bat,  for  example,  acquires  an  entire  spatial  picture  of 
its  surroundings  from  its  sonar  returns  rather  than  some  empty  series  of  blips  that  the  untrained 

human  ear  derives  from  the  sound  of  a  ship’s  sonar;  in  effect  the  bat  sees  with  its  ears  [1].  The 
bam  owl  makes  equally  impressive  use  of  hearing  with  passive  sonar  to  locate  and  capture  prey  in 
the  dark.  In  these  cases,  and  in  speech  recognition  by  humans,  the  sequential  nature  of  the 
information  plays  an  essential  role. 

This  presentation  looks  at  the  use  of  dynamic  holography  for  processing  temporal  and  spatial- 
temporal  information.  We  are  motivated  by  the  success  of  time-delay  neural  networks  in  speech 
recognition,  [2,  3,  4]  and  by  the  evident  stmcture  of  the  sound  processing  mechanisms  in  bats  and 
owls.  Furthermore,  as  we  look  at  the  required  processing,  we  will  find  them  as  well  suited  to 
holographic  methods  as  many  image  processing  tasks. 

Our  context  for  this  presentation  is  the  recognition  of  temporal  sequences:  Imagine  listening  to  a 
radio  as  you  tune  across  a  short-wave  band.  It  is  easy  to  recognize  a  channel  that  carries  Morse 
code,  even  if  you  do  not  know  the  code.  That  is  because  Morse  code  consists  of  a  simple  set  of 
temporal  features,  (a  dot,  a  dash,  and  two  pause  lengths)  and  a  Morse  signal  is  characterized  by 
repeated  occurrences  of  these  features.  It  does  not  take  long  for  the  brain  to  identify  these  features 
as  the  dominant  content  of  the  received  signal.  We  accomplish  a  similar  task  with  a  holographic 
optical  system.  This  task  characterizes  a  number  of  acoustical  information  processing  problems 
and  serves  as  a  precursor  for  more  complex  systems. 

Acoustic  processing  in  animals  employs  both  frequency  and  time  domain  operations.  The  short- 
time  Fourier  transform  is  part  of  the  sound  transduction  mechanism  itself  (within  the  cochlea)  and 
the  time  domain  is  served  by  delay  line  stmctures,  at  least  in 
the  bat  and  barn  owl.  The  time  domain  is  serviced  by  short 
and  long  term  memory  storage  as  well.  For  audio 
frequencies  appropriate  delays  are  in  the  milliseconds  to 
seconds  regime.  We  implement  delays  of  this  scale  using  a 

rotating  photorefractive  crystal  [5]. 


As  the  delay  line  is  an  essential  component  in  neural-like 
temporal  signal  processing  it  is  perhaps  worth  describing  the 
principle  in  some  detail  with  the  help  of  Figure  1.  It  shows 
a  reference,  or  pump,  wave  and  a  signal  wave  incident  on  a 
crystal  which  is  (slowly)  rotating  with  the  rotational  axis 
along  the  pump  wave.  At  a  given  moment  t  =  ^gthe  two 
waves  form  a  holographic  grating  within  the  medium;  the 
reference  wave  scatters  off  of  the  grating  to  reconstruct  the 
signal  at  its  incident  polar  angle  defined  as  0  =  0.  In  this 
specific  geometry  the  reference  wave  continues  to  scatter  off 
of  the  grating  as  the  crystal  rotates,  but  the  grating  rotates  so 
the  reconstructed  signal  sweeps  out  a  cone  in  time,  as 


Photorefractive 

BaTiO.’i 


Figure  1.  Photorefractive  delay  line 
having  several  parallel  channels  of  delay. 
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indicated  in  the  figure.  In  the  meantime,  a  new  grating  is  being  written  at  every  moment.  Thus, 
the  signal  “now”  reads  out  at  0  =  0,  while  the  signal  at  increasingly  earlier  times  reads  out  at 
increasingly  larger  angles.  If  we  position  ourselves  at  some  angle  6 ,  we  will  observe  the  signal  at 
a  time  t'  =  t  +  0/Q.,  where  £2  is  the  rotation  rate  of  the  crystal.  If  the  delays  are  continuously  read 

out,  old  gratings  become  erased  as  new  ones  are  written,  so  the  delayed  signal  experiences  an 
overall  decrease  in  time.  Other  than  this  fixed  amplitude  reduction  (for  a  given  delay  time)  the 
signal  is  reproduced  in  this  system  in  amplitude,  frequency  and  phase.  In  practice  we  use  rotation 
rates  of  0.5  to  2  rpm  for  total  delays  of  up  to  about  a  second.  Typically  the  desired  delay  region  is 
sufficiently  short  that  the  arc  is  nearly  a  straight  line. 

One  can  use  an  array  of  inputs  rather  than  a  single  one,  and  thereby  have  a  collection  of  parallel 
delay  lines.  Figure  1  shows  readout  by  the  same  reference  beam  used  to  record  the  grating.  In  our 
work  we  often  use  the  phase  conjugate  of  the  reference  to  produce  a  phase  conjugate  of  the  signal. 


In  previous  work  we  have  used  the 
photorefractive  delay  time  to 
implement  a  time-delay  neural 
network  for  word  recognition  [6]. 
Now  we  describe  the  mechanism  for 

temporal  feature  extraction  [7]. 
Schematically,  our  system  consists  of 
an  optical  resonator  that  contains  two 
photorefractive  crystals  as  processing 
elements  (Figure.  2).  The  first  crystal 
(gain  crystal)  provides  the 
amplification  necessary  for 
oscillation.  It  is  pumped  by  a 
Gaussian  wave  that  carries  the 


Figure  2.  Schematic  of  a  single  temporal  feature  extractor. 


temporal  information  S(t)  of  interest. 
The  second  crystal  provides  delay  as 
we  have  described  above:  the 
rotation  rate  for  this  task  is  about  one 
rpm. 

The  resonator  field  can  build  up  in  a 
number  of  spatial  modes  which  are 
aligned  along  the  time-delay 
coordinate.  The  delay  line  provides  a 
(time-shifted)  unidirectional  coupling 
among  the  spatial  modes.  In  this 
way,  the  spatial  structure  of  the 
collection  of  all  the  modes,  which  we 
call  collectively  a  chronomode,  is 
modified  each  time  it  traverses  the 
delay  element.  The  equilibrium 
structure  is  determined  by  the 


Figure  3.  Experimental  apparatus  for  temporal  feature  extractor. 


temporal  characteristics  of  the  input  signal  S(t). 


In  the  gain  crystal  the  interaction  of  the  chronomode  light  with  the  pump  wave  S(t)  creates  a 
photorefractive  grating  by  a  conventional  two  wave  mixing  process.  This  grating  matches  the 
particular  chronomode  spatial  structure.  It  can  be  thought  of  as  a  matched  filter,  in  the  sense  that  it 
permits  resonator  oscillation  only  when  the  dominant  temporal  feature  is  present  at  the  input. 
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Figure  3  shows  the  experimental  implementation  of  the  sehematic  of  Figure  2.  Note  that  the  delay 
line  is  used  with  phase-conjugated  readout  using  an  additional  photorefractive  crystal  in  a  four- 
wave  mixing  configuration. 


In  our  experiments  we  have  shown  that  a 
temporal  feature  occurring  twice  as  often  than 
other  ones  is  chosen  by  the  system  with  a 

response  contrast  ratio  exceeding  10:1  [7].  An 
expanded  version  of  the  above  system  has  also 
been  developed,  it  contains  a  second  ring 
resonator,  and  thus  two  chronomodes.  The  two 
chronomodes  share  the  same  gain  crystal  and 
compete  for  the  pump  energy.  This  competition 
forces  different  features  in  the  input  signal  to  be 
associated  with  different  chronomodes.  Figure 
4a  shows  the  two  input  signals  applied 
alternately  to  modulate  the  input  beam.  Figure 
4b  shows  an  instantaneous  image  of  the  two 
elongated  chronomodes  in  response  to  Signal  1 . 
The  lower  chronomode  responds  most  strongly. 
When  Signal  2  is  applied,  in  contrast,  the  upper 
chronomode  oscillates  strongly  (Fig.4c). 

In  a  Morse  signal  the  letters  of  the  alphabet  are 
comprised  of  a  short  sequence  of  basic  Morse 
features;  words  are  comprised  of  a  sequence  of 
letters,  and  so  on.  By  cascading  temporal 
feature  extractors  like  the  one  described  here  one 
can  hope  to  extract  the  feature  hierarchy 
contained  in  Morse  code  and  other  complex 
signals. 


Signal  1  Signal  2 


200  ms 


Figure  4.  Temporal  feature  extraction.  a)Two  different 
temporal  features  shown  repetitively  with  equal 
probability.  b)Response  of  two  modes  to  Signal  1  — 
lower  ring  responds  most  strongly.  c)Response  of  two 
chronomodes  to  Signal  2  —  upper  ring  responds  most 
strongly. 
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General  purpose  super computing  is  best  accomplished  today  with  the  use  of 
multiple  high  performing  central  processors.  Performance  of  these  systems  is 
greatly  dependent  upon  the  operating  frequency  of  each  individual  processor^ 
which  begins  with  the  operating  frequency  of  the  individual  integrated  circuit 
components  used  to  implement  the  logic.  As  operating  frequency  of  integrated 
circuit  components  increases  the  challenge  to  take  advantage  of  this 
improved  frequency  becomes  increasingly  more  difficult.  One  of  the  big 
challenges  is  to  distribute  a  clock  to  each  of  the  approximate  18  million 
latches  in  the  system  with  minimal  skew  so  that  all  latches  operate 
synchronously.  Two  properties  of  clock  distribution  that  is  critical  to  system 
performance  is: 

Skew  -  Time  difference  in  clock  signal  delivered  to  any  two  latches 
in  the  entire  system. 

Signal  Integrity  ^  The  quality  of  the  clock  signal  delivered  to  any 
latch  in  the  system. 

With  these  properties  in  mind  distribution  of  a  500  Mega  Hertz  clock  signal 
seemed  to  be  a  natural  fit  for  optics.  The  original  system  skew  budget  using 
Root  Sum  Squared  (RSS)  is  listed  in  Table  1. 

Table  1 


COMPONENT 

Original  Skew  budget 

A.  Fiber  trim  for  4  x  4  Star  Coupler 

+/-  10.0  ps 

B.  Fiber  trim  for  1x24  Tree  Coupler 

+/-  10.0  ps 

C.  Laser,  Receiver  and  Clock  distribution  to  Logic  Gate  Array  I.C 

+/-  35.0  ps  (combination) 

D.  Clock  Distribution  within  Logic  Gate  Array  LC. 

+/-  92.0  ps 

TOTAL  RSS  VALUE 

+/-  99.4ps 

As  the  effort  of  component  selection  for  this  optical  clock  distribution 
system  was  initiated  it  was  discovered  that  the  demand  required  for 
supercomputer  system  performance  and  packaging  was  not  addressed  by  the 
telecommunications  market.  To  take  advantage  of  the  noise  immunity  offered 
by  optics  a  significant  development  effort  would  be  required.  A  major 
portion  of  this  effort  was  in  the  development  of  an  optical  transmitter/ 
receiver  link.  This  was  decided  to  be  a  single  effort  so  that  performance  of 
the  link  could  be  developed  to  meet  the  system  performance  goal. 

Completion  of  system  architecture  and  partitioning  along  with  the  maximum 
system  configuration  defined  how  many  copies  of  the  optical  clock  signal  would 
be  required.  It  also  made  it  clear  as  to  the  number  of  components  that  would 
be  required  in  the  distribution  path  from  the  optical  transmitter  to  the 
optical  receiver  (Figure  1) . 


COMPONENT 


Skew  Budget 


D.  Fiber  trimming  for  4  x  4  Star  Coupler 

+/-  10.0  ps 

E.  1  x24  Tree  Coupler  Uniformity 

Accounted  in  Rcvr  spec. 

F.  Fiber  trim  for  1x24  Tree  Coupler  +/-  10.0  ps 

G.  Fiber  trim  at  Coupling  jumper  to  Rcvr  pigtail  +/-  10.0  ps 


H.  Optical  Receiver  -  (  Propagation  variation  +/-  25  ps)  +/-  70.0  ps 

-  (Jitter  noise  +/-  20  ps) 

-  (  Duty  cycle  distortion  +/-  25  ps) 


I.  Printed  Circuit  Board  nulled  out 

J.  Clock  distribution  to  Logic  Gate  Array  I.C.  +/-  30.0  ps 


K.  Clock  Distribution  within  Logic  Gate  Array  LC.  +/-  92.0  ps 


TOTAL  RSS  VALUE  +/-  123.2ps 
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Table  3 


Configuration 

Splitting 

Insertion  Loss 

Excess  Loss(dB) 

Uniformity  (dB) 

- - - - - 

ideal 

typical 

maximum 

typical 

maximum 

typical 

maximum 

4x4 

-6.02 

6.30 

7.10 

0.28 

1.08 

0.60 

0.70 

1x24 

-13.80 

14.50 

16.30 

0.70 

2.50 

1.60 

4x4-1x24 

-19.82 

20.80 

23.40 

0.98 

3.58 

2.20 

— - - -  I 

OPTICAL  POWER  BUDGET  CALCULATION 

Cascaded  Couplers  &  Laser 

typical 

maximum 

Fan  out  loss  (calculated,  dB) 

19.82 

19.82 

Excess  Loss  (dB) 

0.98 

3.58 

Connector  losses  (dB). 

1.50 

2.25 

Total  Losses  (dB) 

-22.30 

-25.65 

Averaae  Ootical  Power  Source  (50%  duty  cvcleL 

6.99 

6.99 

Average  Power  delivered  (dBm) 

-15.31 

-18.66 

As  the  final  systein  evolved,  the  iinpact  of  the  inteirdepenclencies  between  all 
components  in  the  optical  system  became  apparent,  more  specifically  the 
transmitter/receiver  link.  The  critical  transmitter/receiver 
specifications  that  directly  impact  the  system  clock  skew  were  finalized 
{Table  2  item  H) .  With  all  final  components  of  the  optical  system  and 
specifications  in  place  the  optical  power  budget  was  also  clearly  defined 
(Table  3) . 

The  final  realization  of  an  optical  clock  distribution  system  for  use  in  a  500 
Mega  Hertz  supercomputer  system  has  been  successfully  achieved. 
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INTRODUCTION 

In  massively  parallel  computers,  electrical  networks  have  the  serious  problems  regarding  pin  bottle-necks  in  switches, 
and  the  number  of  pathways  between  processor  elements  (PEs).  In  electrical  networks,  the  number  of  pins  for  a  chip 
decides  the  channel  size  for  electrical  crossbar  switches,  and  network  size.  The  number  of  pins  for  16  ch  crossbar 
switches  exceeds  more  than  IK  (1,024),  when  the  32  ch  external-buses  are  used.  The  16  ch  size  may  be  limited  in 
a  chip  for  an  electrical  switch.  By  using  the  16  ch  switches,  a  maximum  128  ch  Clos  network[l],  with  only  a  strictly 
non-blocking  function,  may  be  accomplished.  Photonic  technologies  may  serve  larger  size  crossbar  switches,  and 
they  achieve  more  than  IK  ch  networks.  Free-space  optics  may  also  overcome  pathway  problems,  because  light 
beams  can  cross  each  other  with  no  mutual  interference.  Various  data  multiplexing  technologies  may  be  used  in 
optical  networks. 

For  solving  the  problems  in  electronics,  interconnection  networks  with  non-blocking  and  self-routing 
functions  suitable  for  photonic  technologies,  are  described  in  this  paper.  In  order  to  accomplish  the  optical  networks, 
wavelength-division  multiplexing  (WDM)  and  space-division  multiplexing  (SDM)  switches  are  also  proposed.  Basic 
experiments  for  optics  in  the  switches  show  maximum  network  size. 

NETWORK  ARCHITECTURE 

A  self-routing  function  is  very  important  for  conventional  massively-parallel  computer  networks  for  avoiding  complex 
path-hunting.  Figure  1  shows  the  proposed  3-stage  network  with  non-blocking  and  self-routing  functions.  In  order 
to  achieve  N  x  N  networks,  N  sets  of  1  x  m  switches,  m^  sets  of  N/m  x  N/m  switches,  and  N  sets  of  m  x  1  switches 
are  required.  Functions  for  the  switches  are  the  same  as  for  the  switches  in  a  multi-stage  self-routing  networks. 
Address  signals  on  the  top  of  data  are  recognized  in  the  switches,  an  output  port  for  the  data  is  designated. 
Operations  for  the  network  are  similar  to  full  crossbar  switches.  Three-stage  networks  bring  low-latency 
interconnections. 

OPTICAL  IMPLEMENTATION 

In  order  to  achieve  the  proposed  networks,  1  x  m  WDM  switches,  SDM  crossbar  switches  and  m  x  1  SDM  switches 
are  used  at  1st,  2nd  and  3rd  stages  in  the  networks,  respectively.  Figure  2  shows  data  and  control  flow  in  the 
network.  Transmitter  modules  for  the  WDM  switches  and  the  m  x  1  SDM  switches  are  located  on  PE  boards. 
Receiver  modules  for  the  WDM  switches  are  located  on  the  SDM  crossbar  switches.  Output  data  from  PEs  are 
multiplexed  in  time-domain.  All  output  data  in  a  board  are  multiplexed  in  wavelength-domain,  and  they  are 
transferred  to  SDM  crossbar  switches.  Electrical  circuits  on  the  SDM  crossbar  switches  recognize  address  signals 
and  desired  PEs  are  designated.  The  SDM  crossbar  switches  coimected  to  the  same  PE,  are  linked  by  a  bus  for 
arbitration.  Each  crossbar  switch  confirms  the  condition  for  desired  PEs  by  using  the  bus.  All  control  circuits  for 
3  switches  are  located  on  the  SDM  crossbar  switch.  After  all  pathways  are  prepared,  high-speed  data  are  transferred 
to  the  PEs. 

Figure  3  shows  4  pairs  of  the  1  x  4  WDM  switches[2].  In  a  transmitter  module,  output  light  beams  from 
multi-wavelength  vertical-cavity-surface-emitting-lasers  (VCSELs)[3]  are  coupled  into  a  multi-mode  optical  fiber 
using  planar  microlenses  (PMLs)  and  a  Selfoc  microlens  (SML).  In  a  receiver  module,  the  beams  are  separated  to 
m  signals  by  using  a  1  x  m  star  coupler.  Each  separated  beam  is  incident  to  a  grating.  Different  wavelength  beams 
are  split  off  incident  to  1-D  photo-diodes  (PDs).  Detected  signals  are  selected  by  recognizing  address  signals. 

Figure  4(a)  shows  the  64  ch  SDM  crossbar  switch.  64  units  of  8  x  8  VCSELs  and  an  8  x  8  multi-mode 
optical  fiber  array  are  connected  by  hybrid  8f  imaging  optics  with  PMLs,  SMLs  and  beam-splitters  (BSs),  as  shown 
in  Fig.  4(b)[4].  An  individual  pixel,  located  at  the  same  position  in  the  VCSELs,  is  imaged  onto  a  pixel  at  the  same 
position  in  the  optical  fiber  anay.  By  selecting  one  of  the  VCSEL  pixels,  outputs  are  designated.  By  selecting  some 
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of  the  VCSEL  pixels,  multicast  or  broadcast  functions  are  available.  Optics  size  may  be  reduced  to  80  mm  x  90 
mm  X  4  mm  by  using  PMLs  with  250  pm  pitch  and  SMLs  with  4  mm  diameter.  Total  switch  size,  including 
electrical  circuits,  may  be  about  130  mm  x  130  mm.  Each  output  light  beam  from  VCSELs  passes  through  6  BSs, 
and  total  coupling  losses  are  designed  to  be  18  dB  (6  x  3  dB).  However,  polarizing  and  wavelength  multiplexing 
technologies  reduce  the  losses. 

Figure  5  shows  the  m  x  1  SDM  switch.  Output  light  beams  from  the  SDM  switches  are  concentrated  by 
using  bundle  optical  fibers  with  large  cores.  They  are  incident  to  a  beam  coupler  and  focused  to  a  PD  by  using 
PMLs  and  a  SML,  which  has  the  same  construction  as  the  transmitter  modules  in  the  WDM  switches. 

EXPERIMENTS  ^  , 

On  1  X  8  WDM  switches,  the  optical  power  budget  was  measured  as  shown  in  Fig.  6.  A  transmitter  module,  a 
coupler  and  a  receiver  module  have  3  dB,  11  dB  and  5.5  dB  optical  losses,  respectively.  Eight  beams  with 
wavelengths  (940  nm  -  980  nm)  differing  from  each  other  from  multi-wavelength  VCSELs,  are  transferred  to  the 
receiver  module.  The  optical  beams  whose  wavelengths  are  10  nm  different  from  each  other,  are  separated  with  -10 

dB  crosstalk  by  using  a  grating  with  a  2  pm  pitch.  .  •  u 

On  SDM  crossbar  switches,  alignment  tolerances  for  optics,  in  order  to  achieve  64  ch  switches,  were 

measured  as  shown  in  Table  I.  The  hybrid  8f  imaging  opUcs  has  large  tolerance  for  miss-alignments.  When  PMLs 
with  250  pm  pitch,  SMLs  with  4  mm  diameter,  conventional  BSs  and  glass  blocks  are  used  in  the  module,  90  pm 
and  3.5'  miss-alignments  are  allowed,  if  2  dB  loss  is  permitted  in  the  optics.  Electro-photonic  MCM  (Multi-Chip 
Module)  technologies[5]  whereby  optics  units,  optical  devices  and  electronic  chips  are  mounted  on  the  same 
substrate,  may  assist  packaging  for  the  switch.  Packaging  accuracies  are  also  shown  in  Table  I,  usmg  the 

technologies.  It  may  realize  alignment-free  packaging  for  the  switch. 

On  m  X  1  the  switches,  maximum  channel  number  was  evaluated.  Bundle  optical  fibers  wifli  200  pm  core, 
PMLs  with  250  pm  pitch,  and  an  SML  with  4  mm  diameter  were  used  in  the  switch.  If  3  dB  loss  is  permitted,  31 
optical  fibers  may  integrate  in  the  module,  and  31x1  switches  may  be  achieved. 

NETWORK  SIZE  ,  ^ 

On  the  WDM  switch,  optical  losses  in  the  receiver  module  may  improve  by  3  dB  through  using  a  blazed  grating, 
and  the  optical  beams  with  5  nm  wavelength  pitch  may  be  separated  by  using  a  grating  with  1  pm  pitch.  In  the 

future,  1  X  16  switches  may  be  achieved,  if  20  dB  loss  is  permitted. 

On  the  SDM  crossbar  switch,  a  256  ch  capability  may  be  available  by  using  the  proposed  optics.  However, 
packaging  density,  which  means  channel  number  per  volume,  becomes  minimum,  when  the  chaimel  number  ranges 
from  16  to  64.  Furthermore,  module  size  and  VCSEL  array  size  in  the  64  ch  units  are  reasonable  for  fabrication 
and  maintenance.  The  maximum  channel  number  was  decided  on  as  being  64  ch. 

The  above  discussions  show  that  IK  ch  networks  may  be  accomplished  by  using  the  1  x  16  WDM  switches, 
the  64  ch  SDM  crossbar  switches  and  the  16  x  1  SDM  switches.  The  networks  have  the  same  functions  as  a  full 
crossbar  switch.  Furthermore,  by  combining  with  1  x  n  and  n  x  1  electrical  switches,  nK  ch  networks  may  be 
achieved.  By  considering  pin  bottle-neck  problems  in  electrical  chips,  1  x  32  or  32  x  1  switches  may  be  available. 

By  using  the  switches,  32K  ch  networks  may  be  achieved. 

The  network  will  be  compared  with  the  Clos  network  and  a  full  crossbar  switch.  In  the  proposed  networks, 
1  X  2  or  2  X  1  switches  are  required  1.8,  3.3  and  5.3  times  1  x  2  or  2  x  1  switches  more  than  the  Clos  network  in 
256,  512  and  IK  ch  system,  respectively.  Each  value  is  almost  the  same  as  that  for  the  full  crossbar  switch.  The 
Clos  networks  have  no  self-routing  functions,  and  on  full  crossbar  switches,  it  is  difficult  to  fabricate  large  channel 
switches.  The  proposed  networks  have  advantages  in  regard  to  a  self-routing  function  and  scalability  for  large 
channel  networks. 

SUMMARY 

Optical  interconnection  networks  with  non-blocking  and  self-routing  functions  were  described.  The  1  x  m  WDM, 
the  SDM  crossbar  and  the  m  x  1  SDM  switches  were  proposed  for  achieving  the  networks.  Basic  experiments 
showed  optics  for  the  networks  may  have  the  capability  to  achieve  up  to  IK  ch  scalability.  The  networks  may 
overcome  problems  in  electronics. 

The  authors  gratefully  thank  I.  Ogura  and  T.  Yoshikawa  for  fabricating  optical  devices  used  in  the 
experiments.  They  also  thank  K.  Kobayashi  and  Y.  Ogura  for  their  suggestions  and  encouragement. 
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Table  I.  Alignment  tolerance  and  packaging 
accuracy  for  the  SDM  crossbar  switch. 
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The  performance  of  a  MIMD  parallel  computer  is  crit¬ 
ically  impacted  by  the  interconnection  network  perfor¬ 
mance,  which  in  turn  is  determined  by  the  network 
topology,  implementation  hardware,  and  communication 
protocol.  Cellular  hypercube  (CH)  interconnection  net¬ 
works,  with  emphasis  on  a  symmetric  cellular  hypercube 
(SCH)  network,  were  studied  for  the  system  discussed  in 
this  paper  because  they  can  exploit  the  communication 
locality  observed  in  parallel  applications  [1],  are  reason¬ 
ably  scalable  due  to  their  O(logN)  connectivities,  and  can 
be  implemented  with  moderate  requirements  on  the 
number  of  wavelength  channels  needed.  While  free- 
space  optics  can  realize  highly  parallel  CH  networks  [2], 
little  progress  has  been  made  in  designing  an  efficient 
protocol  for  optical  data  communication.  In  this  paper  a 
CH  interconnection  system  based  on  a  collisionless 
wavelength-division  multiple  access  with  reroute 
(WDMA-R)  protocol  is  proposed.  This  system  incorpo¬ 
rates  space-,  time-,  and  wavelength-multiplexing  to 
achieve  dense  communication,  simple  control,  and  multi¬ 
ple  access.  Analytic  models  based  on  semi-Markov  pro¬ 
cesses  were  employed  to  analyze  this  protocol.  The  per¬ 
formance  of  the  protocol  in  terms  of  network  throughput 
and  data  packet  delay  is  evaluated  and  compared  to  other 
protocols. 

The  communication  protocol  described  in  this  paper  is 
intended  to  be  used  on  a  MIMD  message  passing  parallel 
computer  that  has  CH  optical  interconnections.  In  this 
computer,  each  processing  node  consists  of  an  electronic 
processor,  an  electronic  local  memory,  and  an  optoelec¬ 
tronic  input/output  interface.  Optical  fibers  are  used  to 
guide  signals  to/from  the  free-space  optical  interconnec¬ 
tion  network  and  the  actual  CH  interconnection  pattern  is 
implemented  by  the  hologram  array  (Fig.l).  (For  cases 
in  which  a  smart  pixel  array  implements  the  set  of  pro¬ 
cessing  nodes,  the  optical  fibers  are  not  needed.)  Optical 
beams,  after  being  diffracted  by  the  hologram  array,  are 
Fourier  transformed  to  the  output  plane  via  a  bulk  lens  or 
a  lens  array  (not  shown).  Issues  related  to  feasibility  and 
scalability  of  a  similar  optical  system  have  been  exten¬ 
sively  evaluated  [3].  It  has  been  suggested  that  such  an 
optical  system  could  support  thousands  of  processing 
nodes  in  a  compact  volume.  As  shown  in  Fig.  1,  each 
input  pixel  (corresponding  to  a  processing  node)  in  an 
NxN  array  is  connected  to  the  other  pixels  at  distances 
of  ±2^, /:=0,1,...,  along  both  a:  and  y  dimensions.  SCH 
differs  from  conventional  cellular  hypercube  by  wrap¬ 
ping  around  the  connections  in  both  dimensions  to  make 
the  network  logically  symmetric  to  each  processing  node. 
For  example.  Fig.  2  illustrates  the  interconnections  of  a 


1-D  SCH.  Routing  paths  in  a  2-D  SCH  can  be  deter¬ 
mined  by: 

(  (D^-SiMN-l)/2)modN)-{N-l)/2,  N:  odd 
(  {Di-Si-\-N /2)modN)-N/2,  N:  even 

in  which  Di  and  Si  represent  the  destination  node  and  the 
source  node  addresses  (respectively)  along  the  i  (x  or  y)- 
dimension.  For  example,  if  A=33,  (Dx,Z)y)=(18,4),  and 
(5';f,5j)=(3,12),  the  routing  path  is  then  determined  by 
(15)x(-8)y  =  +(01 1 1  \)x  -(01000)^.  Thus,  it  needs  to  take 
the  sequence  of  /:  =  +4,+3,+2,+l  connections  (4  hops) 
along  the  x-dimension  and  a  /:  =  -4  connection  (1  hop) 
along  the  y-dimension  for  a  data  packet  to  be  transmitted 
from  the  source  node  to  the  destination  node.  The  order 
of  connections  may  be  varied  as  long  as  each  connection 
link  is  performed  exactly  once. 

Several  features  are  provided  by  the  CH  networks. 
First,  the  routing  algorithm  can  directly  provide  multiple 
routing  paths.  Second,  by  providing  denser  communica¬ 
tion  links  to  nodes  within  a  local  neighborhood,  the  net¬ 
work  can  make  use  of  the  reference  locality  observed  in 
most  parallel  applications. 

Time-division  multiple  access  (TDMA)  and  wave¬ 
length-division  multiple  access  (WDMA)  are  evaluated 
to  compare  to  the  proposed  WDMA-R  protocol.  All 
these  protocols  employ  a  two-phase  algorithm,  which  re¬ 
duces  packet  waiting  time  and/or  the  total  number  of 
wavelength  channels;  ie.,  processing  nodes  first  send 
data  packets  (in  TDMA)  or  reserve  data  channels  (in 
WDMA  and  WDMA-R)  in  one  dimension,  and  then  re¬ 
peat  the  process  in  the  other  dimension.  In  addition,  each 
node  is  equipped  with  one  inbound  buffer  (size  one)  and 
one  outbound  buffer  (size  B)  to  store  incoming  and  out¬ 
going  data  packets;  this  provides  for  concurrent  data 
transmission  and  reception.  These  three  protocols  are 
explained  below. 

TDMA:  A  class  is  defined  as  a  set  of  processing  nodes 
whose  fanouts  do  not  cause  any  contention.  Each  class  is 
preassigned  a  unique  time  slot  per  cycle  for  data  trans¬ 
mission  [4].  The  slot  size  is  fixed  at  length  L,  equal  to 
the  duration  for  transmitting  a  data  packet.  For  example, 
there  are  a  total  of  M=ll  classes  when  N=22  (Fig.  3). 
This  system  requires  each  node  to  be  equipped  with  one 
pair  of  transceivers. 

WDMA-R:  Instead  of  assigning  different  time  slot  for 
each  class  as  in  TDMA,  a  distinct  wavelength  channel  is 
designated  to  each  class  of  nodes  (Fig.  3).  This  system 
employs  a  reservation  scheme  that  uses  one  control 
channel  for  data  channel  reservation  and  M  data  channels 
for  data  transmission.  Each  processing  node  is  equipped 
with  one  fixed-tuned  transmitter  FT  (tuned  at  its  home 
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wavelength  channel),  one  tunable  receiver  TR  (capable 
of  tuning  to  any  one  of  the  data  channels),  one  transmit¬ 
ter  FTc,  and  one  receiver  FRc  (both  fixed  at  the  common 
control  channel  wavelength).  FT  and  TR  are  used  on  the 
data  channels  for  packet  transmission  and  reception.  FT^ 
is  used  to  send  out  control  packets,  while  FR^  continu¬ 
ally  monitors  the  control  channel  to  receive  all  the  con¬ 
trol  packets. 

Each  control  cycle  on  the  control  channel  contains  two 
phases:  x-control  subcycle  and  y-control  subcycle  as  de¬ 
scribed  previously.  Each  x  (ory)-control  subcycle  is  fur¬ 
ther  split  into  one  “status  slot”  and  one  “reservation  slot” 
(Fig.  4).  Each  node  has  a  different  look-up  table  to  map 
the  channel  numbers  to  the  corresponding  node  numbers 
and  connection  link  numbers  (k);  a  distinct  status  minis¬ 
lot  and  a  reservation  minislot  are  preallocated  to  each 
processing  node.  By  setting  its  corresponding  status 
minislot,  a  processing  node  can  notify  other  nodes  its 
state  status:  free,  busy,  full,  or  ready. 

Any  node  (sender)  that  wants  to  transmit  a  data  packet 
to  a  target  node  first  monitors  one  control  subcycle  to  see 
if  the  channel  is  available  (Fig.  5).  If  the  target  node  is 
not  currently  available  (could  be  busy,  full,  or  reserved), 
the  sender  either  chooses  another  channel  (from  one  of 
the  other  possible  target  nodes)  or  waits  for  the  next  con¬ 
trol  subcycle  (if  all  the  possible  target  nodes  are  unavail¬ 
able).  Once  the  channel  is  available,  the  sender  reserves 
the  channel  by  inserting  the  target  node  address  in  the 
corresponding  minislot  of  the  reservation  slot  and  then 
sends  out  a  data  packet.  After  receiving  the  request,  the 
target  node  tunes  its  detector  TR  to  the  home  channel  of 
the  sender  for  data  reception.  Meanwhile,  the  target 
node  will  issue  a  busy  state  on  the  status  slot  to  notify  the 
other  nodes  until  the  end-of-packet  header  is  received. 

The  proposed  WDMA-R  protocol  offers  several  dis¬ 
tinct  features.  First,  it  supports  various  types  of  routing 
algorithms,  albeit  with  varying  degrees  of  efficiency: 
store-and-forward,  virtual  cut-through,  wormhole  rout¬ 
ing,  and  circuit  switching  [5].  Second,  with  the  use  of 
the  status  slot,  this  system  provides  the  flexibility  of 
transmitting  non-fixed  sized  data  packets  at  various 
transmission  rates.  It  also  can  support  fault  tolerance  and 
signal  rerouting  by  avoiding  the  connections  with  busy 
or  malfunctioned  nodes.  Third,  packet  waiting  time  is 
reduced  as  compared  to  TDMA  because  of  the  reduced 
length  of  one  cycle  time.  Fourth,  only  moderate  number 
of  wavelength  channels  is  required  (20  channels  for  10^ 
nodes).  Furthermore,  as  compared  to  the  protocol  of  ref 
[6],  each  node  requires  a  smaller  status  buffer  size  and 
less  computation;  also  due  to  the  CH  topology,  time  shar¬ 
ing  of  wavelength  channels  is  not  necessary,  and  the 
system  scalability  is  no  longer  limited  by  the  optical 
power  budget. 

WDMA  is  similar  to  WDMA-R  except  once  a  target 
node  is  selected,  the  source  node  will  wait  until  the  target 


node  is  available  without  switching  to  another  target 
node. 

Semi-Markov  models  [6]  were  used  to  analyze  the 
performance  of  TDMA,  WDMA,  and  WDMA-R.  These 
models  were  based  on  the  following  assumptions:  a  SCH 
interconnection  network  is  considered;  all  nodes  behave 
independently;  the  basic  time  unit  is  defined  as  the  size 
of  one  status  minislot;  package  arrival  at  each  outbound 
buffer  follows  a  Poisson  process  with  rate  X  (packets  per 
unit  of  time);  a  node  can  generate  at  most  one  packet  per 
unit  of  time;  the  data  packet  size  is  fixed;  a  sender  trans¬ 
mits  data  to  any  target  node  with  equal  probability;  store- 
and-forward  routing  algorithm  is  employed.  The  per¬ 
formance  was  evaluated  in  terms  of  network  throughput 
and  data  packet  delay.  Network  throughput  is  defined  as 
the  average  number  of  packets  transmitted  through  the 
network  per  unit  of  time.  Data  packet  delay  is  the  time 
duration  from  a  packet's  arrival  at  an  outbound  buffer  of 
a  sending  node  to  the  packet's  reception  by  a  target  node. 
System  performance  was  evaluated  for  various  in  data 
packet  sizes,  outbound  buffer  sizes,  and  numbers  of  pro¬ 
cessing  nodes. 

Our  simulations  demonstrate  that  the  proposed 
WDMA-R  outperforms  both  WDMA  and  TDMA  in  all 
cases.  As  the  size  of  data  packets  (L)  is  increased  from 
20  to  50,  network  throughputs  are  reduced  and  data 
packet  delays  are  increased  in  all  three  protocols  (Fig.6). 
TDMA  is  most  vulnerable  to  the  change  of  data  packet 
size  because  its  cycle  time  is  proportional  to  L.  WDMA- 
R  scales  slightly  better  than  TDMA  and  WDMA  as  the 
number  of  processing  nodes  grows  because  of  the  in¬ 
crease  in  the  number  of  possible  paths  from  a  source 
node  to  a  destination  node,  which  results  in  lower  proba¬ 
bility  of  blocking  (Fig.7).  Performance  as  a  function  of 
outbound  buffer  sizes  (B)  will  also  be  presented. 

This  work  was  supported  in  part  by  AFOSR  (Grant 
No,  F49620-93- 1-0437)  and  ARPA  (Grant  No.  F49620- 
94-1-0045). 
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Fig.  6  Impact  of  packet  size  (L)  and  arrival  rate  on  performance  (N=33). 


Fig.  5  WDMA-R  protocol.  S,  status  slot;  R,  reservation  slot. 
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1.0  Introduction 

The  mesh  is  one  of  the  most  popular  interconnection  networks  for  single-instruction  multiple-data  (SIMD)  [1]  array 
processors.  For  mesh-connected  array  processors,  the  communication  latency  is  proportional  to  the  diameter  (about 
two  times  the  linear  size)  of  the  array,  which  is  the  maximum  number  of  nodes  a  message  has  to  travel  to  reach  its 
final  destination.  To  decrease  the  diameter  of  the  mesh,  we  have  developed  an  optoelectronic  reduced  cellular 
hypercube  (RCH)  interconnection  [2]  which  is  combined  with  the  mesh  to  form  a  combined  network  called  M-RCH. 
In  this  paper,  we  present  a  time  multiplexing  scheme  for  the  optoelectronic  RCH  and  discuss  the  communication 
speedup  of  the  M-RCH  over  the  mesh  for  some  common  operations  and  ^plications. 

2.0  Background 

The  optoelectronic  processor  discussed  here  is  a  dense  mesh-connected  SIMD  array  processor,  which  is  built  on  one 
or  a  few  wafers  or  multichip  modules.  The  mesh  is  represented  by  the  regular  2-D  array  of  N  x  N  processing 
elements  (PEs)  (with  N  a  power  of  two)  shown  in  Fig.  1.  Each  PE  has  bidirectional  communication  with  its  local  4- 
connected  neighbors  through  electrical  interconnections.  There  are  no  wrap-around  electrical  connections  between 
PEs  at  the  edges  of  the  array.  In  addition,  each  PE  has  one  optical  source  (shown  as  a  square)  and  one  or  more 
detectors  (shown  as  circles)  to  support  the  optical  RCH  interconnections.  Since  this  is  a  SIMD  machine,  the  same 
instruction  is  broadcast  from  an  array  control  unit  to  all  PEs,  but  only  selectively  activated  PEs  will  synchronously 
execute  the  instruction  tm  their  local  data. 

To  speed  up  the  inter-PE  communication  of  the  mesh,  the  optical  cellular  hypercube  (CH)  [3], [4]  and  optical 
reduced  cellular  hypercube  (RCH)  [2]  are  two  enhancements  for  the  electrical  mesh.  The  CH  optically  connects  one 
PE  to  other  PEs  at  distances  in  the  connection  set  (CS)  which  consists  of  all  integer  powers  of  two  less  than  half  the 
diameter,  i.e.  {1, 2, 4, ... ,  N/2}  in  the  ±x  and  ±y  directions.  The  RCH  is  similar  to  the  CH,  but,  in  general,  its  CS 

is  a  subset  of  the  CH  CS.  Thus,  the  CS  of  the  RCH  consists  of  /  elements  { 2*i,2^,...,2*>' },  where  /  >  1.  Both  the 
CH  and  RCH  interconnections  are  shift-invariant.  Figure  2  illustrates  a  "transmissive”  optical  setup  for  the  RCH 
using  a  smart  pixel  array  processor  having  detectors  at  the  left  side  of  the  chip  and  sources  at  the  right  side.  A 
transmissive  diffractive  element  in  the  pupil  plane  provides  the  fixed  interconnection  point-spread  function.  The 
actual  physical  realization  may  be  reflective  [4],  in  which  the  sources  and  detectors  are  at  the  same  side  of  the  smart 
pixel  array  and  a  reflective  diffractive  element  is  used  for  optical  interconnections. 

Both  the  CH  and  RCH  are  one-to-many  interconnections  which  introduce  contention,  i.e.,  one  detector  receives  data 
from  more  than  one  transmitting  PE  in  one  clock  cycle.  One  PE  not  only  contends  with  PEs  in  the  same  row  (or 
column)  but  also  contends  with  PEs  in  other  rows  (columns),  as  shown  in  Fig.  1.  To  solve  the  contention  problem  of 
the  CH,  one  time  multiplexing  algorithm  (called  the  even-distribution  or  ED  algorithm  here)  was  proposed  in  [4].  In 
addition,  the  minimum  required  number  of  time  slots  (Af )  was  calculated  using  the  even-distribution  algorithm  under 
the  assumption  that  all  PEs  are  activated  to  communicate  at  the  same  time.  Here,  we  denote  the  physical  clock  cycle 
length  required  for  one  PE  to  send  data  to  any  of  its  electrically  or  optically  connected  neighbors  as  This  implies 
that  the  clock  cycle  length  of  a  time  slot  is  equal  to  Because  M  >  1,  it  was  also  shown  in  [4]  that  it  is  better  to 
use  the  electrical  mesh  for  local  shifts,  discard  the  fan-outs  for  local  shifts  in  the  optical  CH,  and  use  a  modified  CH 
(a  particular  type  of  RCH)  for  global  shifts.  Therefore,  the  idea  of  the  RCH  is  to  combine  the  advantages  of 
electrical  mesh  and  optical  CH. 

3.0  Time  Multiplexing  Schemes 

To  further  decrease  M,  we  discuss  four  techniques:  separable  row  and  column  (SRC)  interconnections;  allowed- 
receiver-contention  (ARC)  condition;  inactive-PE  (INPE)  condition;  and  set-distribution  (SD)  algorithm. 

3.1  Separable  Row  and  Column  (SRC)  Interconnections 

If  one  PE  contends  only  with  PEs  in  the  same  row  or  column,  but  not  both,  in  one  clock  cycle,  only  one  row 
(column)  of  PEs  need  to  be  considered  in  solving  the  contention  problem.  Thus  the  corresponding  optical 
interconnections  are  called  separable  row  and  column  (SRC)  interconnections.  One  implementation  for  the  SRC 
interconnections  is  shown  in  Fig.  3.  On  each  PE,  half  of  the  detectors  support  row  interconnections  and  the  other 
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half  support  column  interconnections.  Another  implementation  is  to  use  optical  sources  or  modulators  which  can 
switch  their  output  polarization  between  two  orthogonal  states  (for  row  and  column  broadcasting)  and  a  polarization- 
selective  computer-generated  grating  [5].  Compared  to  the  first  implementation  in  Fig.  3,  the  second 
implementation  requires  only  one  detector  per  PE. 

32  Allowed-Receiver-Contention  (ARC)  and  InactIve-PE  (INPE)  Conditions 

The  ARC  condition  allows  contention  at  a  particular  PE  (called  PEcont)  which  PEcont  receives  data  from  two 
other  PEs  simultaneously,  but  PEconfPEsourcel^  PEconfPEsource2^  fof  ®  desired  RCH  data  shift  of  d 
units.  Figure  4  shows  a  one-dimensional  example.  We  assume  PEq  and  PEi2  are  transmitting  data  in  the  ^e  time 
slot  for  RCH  right  shift  with  distance  of  2.  Their  corresponding  destinations  are  PE2  and  PE14,  respectively;  and 
they  have  contention  at  PE4  and  PEg.  Now.  if  we  instruct  PE4  and  PEg  to  ignore  any  received  data  or  simply 
deactivate  their  receivers,  we  can  neglect  contention  at  PE4  and  PEg. 

The  INPE  condition  occurs  at  PEs  in  the  boundary  region  of  an  array.  PEs  whose  outputs  for  the  desired  RCH  shift 
fall  off  the  end  of  an  array  can  be  disabled  to  eliminate  them  as  a  source  of  contention.  For  example,  column  PEs 
from  column  N-dto  column  N -1  of  an  N\N  array  can  be  deactivated  so  that  they  do  not  transmit  data  during  a 
desired  RCH  right  shift  d. 

33  Set-Distribution  (SD)  Algorithm 

We  Ulustrate  here  the  set-distribution  (SD)  algorithm  for  finding  M  correspondmg  to  the  RCH  connection  set.  ine 
SRC  interconnections  is  assumed,  and  we  need  consider  only  one  row  of  PEs  in  the  2-D  PE  array.  The  “^l™ 
procedures  of  the  SD  algorithm  are  explained  by  using  a  simple  example:  let  N  =  32,  CS  =  {8, 16),  and  disreg^lhe 
ARC  and  INPE  conditions.  For  this  example,  the  SD  algorithm  consists  of  the  following  three  steps:  1.)  Find 
contention  sets  (COS).  Element  x  in  a  contention  set  represents  the  PE  with  address  x.  In  a  contention  set,  my 
element  contends  with  at  least  one  of  the  other  elements  in  the  same  contention  set.  However,  there  is  no  contenuon 
between  two  elements  from  different  contention  sets.  Therefore,  COSj  =  {i,  i+8,  i+16,  i+24},  i  -  0,  1,  2, ....  7.  2.) 
Find  mutual  exclusive  sets  (MES)  in  each  connection  set.  In  a  MES,  any  element  contends  with  other  elem^te  in 
the  same  MES.  For  this  example,  all  elements  in  a  contention  set  contend  with  each  other.  Therefore,  MESj  - 
COSj,  i  =  0, 1, 2, ....  7,  and  the  required  number  of  time  slots  (M)  is  equal  to  or  greater  than  the  number  of  elements 
in  MES  (4).  3.)  Group  PEs  into  contention-free  groups  (G).  Each  contention-free  group  will  share  one  dedicated 
time  slot.  Contention-free  groups  are  derived  by  arbitrarily  selecting  one  element  from  each  MES.  There  are  many 
ways  to  group  PEs.  One  of  them  is  as  the  following:  Gj  =  {8y,  8/f  1, ....  87+7},  y  =  0, 1, 2, 3.  Therefore  M  is  4.  Since 
M  S  4  as  explained  in  2.),  the  answer  obtained  is  an  optimal  solution.  We  also  have  optimal  solutions  for  64  x  64 
and  128  x  128  arrays. 


4.0  Numerical  Results  _  j  • 

In  this  section,  the  results  are  based  on  the  following  assumptions:  1)  array  size  is  128  x  128;  2)  for  SRC  design, 
there  is  only  one  detector  per  PE  for  receiving  data  broadcast  by  PEs  in  the  same  row  (column);  3)  the  tune  (t/j) 
required  for  one  PE  to  send  data  to  any  of  its  electrically  or  optically  connected  neighbors  is  the  same  for  the  mesh 
andM-RCH. 


Figure  5  shows  the  time  (in  units  of  t/j)  required  for  1-D  (row-wise  or  column-wise)  inter-PE  communication  with 
various  shift  distances.  For  M-RCH,  two  cases  are  considered:  1)  case  1:  without  SRC  design,  without  ARC  ^d 
INPE  conditions,  using  the  ED  algorithm;  2)  case  2:  SRC  design,  with  ARC  and  INPE  conditions,  using  the  SD 
algorithm.  It  can  be  seen  that  the  M-RCH  significantly  improves  the  communication  efficiency  of  the  mesh. 


According  to  the  above  results  for  1-D  inter-PE  communication,  communic^on  speedups  of  M-RCH  (case  2  only) 
over  the  mesh  can  be  derived  for  the  following  common  operations  and  applications:  1)  1-D  one-to-all  broadcast,  1- 
D  bit  reversal,  and  1-D  perfect  shuffle;  2)  1-D  FFT,  2-D  summation,  bitonic  sort,  matrix  transposition,  and  matrix- 
vector  multiplication.  The  2-D  summation  calculates  the  sum  of  all  data  stored  in  a  2-D  array.  It  is  ^sumed  that 
initially  each  PE  stores  one  data  unit  to  be  processed.  The  parallel  algorithms  for  the  M-RCH  are  designed  mosUy 
based  on  the  correspondmg  algorithms  for  the  mesh.  The  communication  speedup  is  defined  as:  the  ratio  of  me 
communication  time  required  by  mesh  to  me  communication  time  required  by  case  2  of  M-RCH.  Table  1  and  2 
show  speedups  for  various  operations  and  applications.  These  resulK  suggest  a  speedup  spectrum  for  parallel 
algorithms  in  which  global  communication  is  important  for  communication  efficiency. 


The  M-RCH  significantly  improves  the  communication  efficiency  of  the  mesh  and  the  four  time  multiplexing 
techniques  improve  me  performance  of  previous  memods  [4].  The  speedup  analysis  for  some  common  operations 
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and  ^plications  suggests  a  speedup  spectrum  for  parallel  algorithms  in  which  global  communication  is  important 
for  communication  efficiency. 


Fig.  1.  Contention:  the  center  PE  contends  with  the  other 
two  PEs  at  two  PEs  represented  by  shaded  squares. 


optical  feedback 
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Fig.  4.  An  example  of  the  alio  wed-receiver-con  tendon 
(ARC)  condition. 


communication  with  various  shift  distances. 


Fig.  2.  A  transmissive  optical  setup  for  the  RCH,  adapted 
from  [4]. 


CS  =  {2,4} 

a  :  source 

•  :  detector  for  row 
interconnection 

o :  detector  for 
column 

interconnection 


Fig.  3.  One  implementation  for  the  separable  row  and 
column  (SRC)  interconnections. 


1-D  one-to-all 
broadcast 
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Table  1.  Communication  speedup  of  M-RCH  (case  2) 
based  on  some  common  operations. 
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Table  2.  Communication  speedup  of  M-RCH  (case  2) 
based  on  some  common  applications. 
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Department  of  Electrical  Engineering 
McGill  University 
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Free-space  optical  interconnects  represent  a  solution  to  the  needs  of  future 
connection-intensive  digital  systems  such  as  ATM  switching  systems,  and  massively 
parallel  processing  computer  systems.  These  systems  will  require  the  large  board-to- 
board  connectivity  provided  by  an  optical  backplane  which  uses  two-dimensional 
arrays  of  passive,  free-space.  Parallel  Optical  Channels  (POCs)  to  optically 
interconnect  electronic  Printed  Circuit  Boards  (PCBs)  and/ or  Multi-Chip  Modules 
(MCMs).  Such  a  backplane  could  be  capable  of  supporting  terabit/ second  aggregate 
capacities  with  connectivity  levels  on  the  order  of  10,000  input/output  channels  per 
PCB. 


We  are  developing  the  optics  and  optomechanics  to  demonstrate  these  high 
bit-rate  optical  backplanes.  As  part  of  this  program,  we  are  constructing  a 
representative  portion  of  a  bi-directional  optical  backplane  capable  of 
interconnecting  two  printed  circuit  boards  which  utilize  FET-SEED  smart  pixel 
transceiver  arrays. 

The  FET-SEED  transmitter  and  receiver  smart  pixel  circuits  were  designed 
and  fabricated  using  a  batch  fabrication  process.(i'2)  At  the  input,  the  drive  FET 
modulates  the  voltage  across  a  Multiple  Quantum  Well  (MQW)  modulator  pair 
resulting  in  differentially  modulated  output  light.  The  electrical  input  impedance 
was  designed  for  50  ohms  to  ensure  efficient  coupling  of  high  frequency  signals, 
which  resulted  in  high  speed  operation  of  the  optical  modulators.  Here,  the  high 
speed  optical  modulation  was  detected  using  the  MQW  diode  pair,  fed  to  an 
inverting  amplifier  section,  and  then  amplified  using  power  FETs  (375  pm  gate 
width)  designed  to  drive  100  ohm  transmission  lines  on  a  PCB.  Both  the  4  x  4 
transmitter  and  receiver  array  optical  windows  were  25  x  25  pm,  separated  by  50  pm, 
with  the  pixel  to  pixel  pitch  being  set  at  200pm. 

The  FET-SEED  smart  pixel  arrays  were  mounted  in  a  high  speed  quad  flat 
packs  that  were  subsequently  installed  onto  the  printed  circuit  boards  via  solderless 
connectors.  These  connectors  permitted  impedance  matching  of  the  smart  pixel 
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array  input/ output  impedances  to  the  50  ohm  printed  circuit  board  transmission 
lines.  Measurements  of  the  rising  edges  of  the  packaged  transmitter  and  receiver 
smart  pixel  array  circuits  yielded  0.811  ns  and  2.57  ns  respectively,  in  good  agreement 
with  device  and  circuit  models.  These  circuits  were  designed  to  run  at  155  MBits /sec 
in  parallel. 

A  bulk  optics  system  has  been  designed  which  is  capable  of  supporting  a  bi¬ 
directional  data  link  between  two  FET-SEED  smart  pixel  arrays.  A  schematic  outline 
of  the  circuit  is  shown  in  Figure  1.  Linear  polarized  light  from  a  Ti-Sapphire  laser 
operating  at  850nm  enters  the  system  via  two  single  mode  polarization  preserving 
fibers.  The  array  generation  set-up  uses  periodic-binary  phase  gratings  to  generate 
4x8  arrays  of  optical  beams  at  the  FET-SEED  modulator  arrays  on  either  device  plane. 
The  modulated  signals  beams  are  then  imaged  via  a  4-f  relay  system  onto  the 
opposite  receiver  arrays,  thereby  allowing  two  way  communication  between  the 
PCBs.  Risley  beam  steerers  were  used  for  fine  positioning  of  the  optical  beams. 

Details  of  the  optical  design,  optomechanics  and  alignment  analysis  will  be 
given.  In  addition,  a  theoretical  analysis  of  a  bi-directional  lenslet  array  based 
interconnection  link  will  be  presented. 

Figures  2a  and  2b  show  preliminary  eye  diagrams  of  the  performance  of  a 
unidirectional  data  link  operating  at  50  MBits/sec  and  155  MBit/sec  respectively 
with  the  system  working  over  one  channel.  The  measured  switching  energy  for 
these  measurements  was  50  fj/bit.  Additional  system  performance  measurements 
and  system  characterization  will  be  presented  including  a  full  16  channel 
interconnection,  bit  error  rate  anaysis  of  the  data  streams,  optical  losses  and  optical 
power  budgets. 
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Interconnection  Theory  and  Optoelectronic  Computing  Architectures 


Haldun  M.  Ozaktas 
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Various  optically  interconnected  computer  architectures  are  compared  based  on  a  number  of 
considerations  including  interconnection  density  and  heat  removal. 
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High-Density  SOO-Gbps/cm^  Parallel  Free-Space  Optical  Interconnection  Design 

Considerations 

Dean  Z.  Tsang 

Lincoln  Laboratory,  Massachusetts  Institute  of  Technology 
Lexington,  MA  02173-9108 

A  high-density,  high-throughput  SOO-Gbps/cm^  parallel  free-space  optical  interconnection 
has  been  designed  and  demonstrated.  The  impact  of  component  technology  choices  and  optical, 
electrical,  and  mechanical  issues  will  be  discussed  within  the  context  of  this  prototype  system  with 
implications  for  future  systems.  Many  of  the  design  considerations  for  this  prototype  are  common 
to  other  optical  interconnection  and  processing  systems  based  on  arrays  of  components.  The 
prototype,  a  linear  array  parallel  free-space  optical  interconnection  with  up  to  twenty  optical  data 
paths,  operated  at  a  rate  of  up  to  2.8  Gbps  per  optical  data  path  with  a  delay  or  latency  of  200  ps. 

Mechanical  alignment  and  placement  accuracy  are  major  issues  for  both  lens-based  and 
holographic-based  free-space  optical  interconnections.  Mechanical  placement  and  alignment 
accuracies  in  the  best  of  present  commercial  digital  systems  are  impressive,  ±10  |j.m.  The 
optoelectronic  components  and  optics,  however,  have  to  be  aligned  to  even  higher  accuracy,  ±2 
|j,m  or  better,  for  the  components  used  here.  Several  approaches  to  system-level  mechanical 
alignment  for  lens-based  free-space  interconnections  include  (1)  a  telecentric  approach  with  a  single 
large  lens  and  lasers  or  modulators  in  the  object  plane  and  receivers  in  the  image  plane,  ^  (2)  a 
telecentric  approach  with  an  object  plane  and  image  plane  connected  by  a  large  collrmating  lens  and 
a  large  focusing  lens,  and  (3)  individual  collimation  and  focusing  lenses  for  each  data  path^  (Fig. 
1).  The  effects  of  positional  and  angular  alignment  on  optical  efficiency  for  the  third  approach  are 
considered  here.  For  our  prototype,  the  board  assembly  alignment  was  assured  to  within  10  |lm, 
both  on  a  board  and  between  boards.  The  design  of  optoelectronic  transmitter  and  receiver 
modules  which  must  function  with  this  degree  of  misalignment  has  been  addressed  with  optical 


TRANSMITTER 

MODULE 


RECEIVER 

MODULE 


Fig.  1.  Microlenses  are  used  for  collimation  and  focusing  in  each  data  path. 
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modules  in  which  ±2  |i.m  alignments  are  performed  during  module  assembly.  The  modules  then 
need  only  to  be  placed  within  a  10  |am  latitude.  The  lenses  have  been  sized  such  that  the  receiver  is 
within  near  field  of  the  transmitter  lens  for  high  optical  efficiency  and  low  crosstalk.  The  lens 
diameters,  120  (xm  for  the  transmitter  and  135  |i.m  for  the  receiver,  were  chosen  as  a  tradeoff 
between  mechanical  tolerances,  interconnect  density,  and  diffraction.  The  prototype  design  has 
low  bit-error  rates  (below  lO'^^)  for  transverse  misalignments  up  to  ±  40  |xm  (Fig.  2).  Much 
greater  latitude  is  possible  with  this  general  approach  when  applied  to  lower  density  applications. 
For  example,  with  2  mm  lenses  a  +700  [xm  range  is  possible. 


LATERAL  POSITION  (|jm) 

Fig.  2.  Low  bit  error  rates  are  measured  over  an  80  |xm  range  transverse  to  the  linear  array. 

Another  mechanical  alignment  requirement  is  determined  by  the  receiver  field  of  view,  the 
maximum  range  of  receiver  module  tilt  before  the  light  misses  the  detector,  which  has  diameter  D. 

The  field  of  view  for  a  lens  with  focal  length  f  is  6  =  2  tan-l(D/2f).  The  field  of  view  can  be 
increased  with  a  short-focal-length  lens  and  a  large  detector  but  must  be  optimized  together  with 
speed  of  response.  The  50-(xm-diameter  detector  used  in  this  application  is  consistent  with  100-ps 
system  transition  times  and  a  tilt  of  ±5  degrees.  This  system  can,  therefore,  preserve  rise  and 
falltimes  of  advanced  electrical  packaging  technology  such  as  multichip  modules  between  boards. 

The  second  type  of  angular  misalignment  is  transmitter  module  tilt  relative  to  the  receiver. 
The  fraction  of  the  beam  collected  by  the  receiver  lens  is  given  by  an  overlap  integral  between  the 
transmit  beam  and  the  receiver  lens  aperture.  This  fraction  is  a  function  of  the  angle  of 
misalignment  and  the  separation  between  lenses.  The  allowable  angle  for  the  prototype  design  is 
few  tenths  of  a  degree  for  separations  of  a  few  millimeters  and  is  consistent  with  advanced 
electrical  packaging  technology. 

Another  set  of  tolerances  is  imposed  on  the  positions  of  components  within  an  array.  The 
center-to-center  spacing  of  some  of  the  components  is  especially  critical.  For  the  prototype 
system,  the  laser  spacing  has  to  match  the  transmitter  lens  spacing  to  within  0.1  |xm  for  low 
crosstalk.  This  is  achievable  with  photolithographic  processing,  but  particular  care  is  necessary, 
especially  with  different  mask  aligners  or  with  components  that  do  not  maintain  the  center-to-center 
spacing  through  the  manufacturing  process.  Special  qualification  of  photoformed  glass  lenses  was 
requir^  to  achieve  the  desired  accuracy  in  the  prototype  system. 
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While  much  of  the  promise  of  optical  interconnections  is  its  high-speed  signal  integrity  vis- 
a-vis  electrical  interconnections,  in  practice  careful  electrical  packaging  is  required  to  achieve  this 
end.  In  particular,  there  is  a  tradeoff  between  interconnect  density  and  crosstalk  at  high 
frequencies.  The  prototype  interconnection  will  be  shown  to  have  a  total  small-signal  crosstalk 
between  nearest  neighbors  of  -30  dB  including  electrical  crosstalk  and  optical  crosstalk  of  the 
optical  interconnect  system  and  the  test  fbcture,.  at  a  frequency  of  1  GHz.  A  significant  limit  to  the 
density  achievable  with  optical  interconnections  is  due  to  practical  limits  on  the  electrical  packaging 
technology  used  for  the  transferral  of  electrical  signals  to  and  from  the  optical  interconnections. 

Ray-tracing  programs  were  used  in  the  design  and  evaluation  of  aberrations  in  the 
prototype  system  which  contains  an  asymmetric  biconvex  transmitter  lens  and  a  piano  convex 
receiver  lens.  The  evaluation  showed  that  a  proper  design  can  yield  high  efficiency  and  low 
crosstalk  even  in  the  presence  of  significant  spherical  aberrations. 

The  choice  of  component  technologies  is  important.  Thermal  issues,  for  example,  are  less 
important  for  980-nm  diode  lasers  than  other  diode  lasers  because  their  output  is  less  sensitive  to 
temperature.  Although  there  are  no  thermoelectric  coolers  or  feedback  in  our  prototype  system, 
there  is  no  significant  variation  in  laser  output  over  20  to  40  C,  well  beyond  the  specified  operating 
range. 


Efficient  low  threshold,  high-speed  lasers  make  system  designs  with  very  low  delays  and 
very  low  skew  possible.  With  many  common  high-speed  transistor  technologies,  each  factor  of 
ten  in  gain  requires  a  stage  of  gain  with  ~100  ps  delay.  Thus  a  minimum  number  of  gain  stages  is 
desirable.  The  980-nm  lasers  have  thresholds  of  ~3  mA  and  are  easily  driven  at  many  times 
threshold  with  the  ~30  mA  available  from  emitter-coupled  logic  compatible  output  drivers.  There 
is  no  laser  driver  to  add  delay.  Efficient  optics  yields  ~1  mA  of  photocurrent  in  the  detector.  A  3- 
stage  GaAs  heterojunction  bipolar  transistor  receiver  converts  this  photocurrent  into  an  emitter- 
coupled  logic  compatible  output.3  The  entire  system,  including  both  transmitter  and  receiver,  has  a 
measured  delay  of  about  170  ps  with  a  skew  between  channels  of  about  25  ps.  The  relatively  high 
signal  levels  also  increase  system  immunity  to  crosstalk  and  variations  in  power  supply  voltages. 

The  overall  electrical  system  design  is  DC-coupled  thus  avoiding  the  need  for  coding  which 
adds  complexity  and  delay  to  both  transmitter  and  receiver.  The  high  signal  levels  allow  low-gain 
amplifiers  to  be  used  and  thus  avoid  deleterious  effects  of  input  offset  current  drift,  1/f  noise,  and 
other  issues  associated  with  high-gain  DC-coupled  receiver  electronics.  Experimentally,  there 
were  no  errors  in  >15  hours  at  2  Gbps  with  2^^-l  pseudorandom  sequences. 

The  results  presented  here  demonstrate  dense  parallel  arrays  of  efficient  free-space  optical 
interconnections  with  tremendous  data-handling  capacity.  The  interconnection  has  a  very  high 
density  with  a  data  capacity  per  unit  area  of  up  to  300  Gbps/cm^. 
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(1)  Introduction.  The  use  of  diffractive  optical  elements  (DOE's)  and  microlens  arrays  in  optical 
interconnection  systems  can  potentially  provide  high  throughputs  and  small  system  volumes,  using  components  that 
are  amenable  to  automated  design  and  mass  production  techniques.  This  paper  considers  fixed-weight  neural 
network  interconnections  based  on  such  components,  and  focuses  on  the  realm  of  small  system  volumes  and  short 
propagation  lengths  (~1  mm)  with  potential  for  cascading  into  a  compact,  multilayer  free-space  system. 

We  consider  the  space-variant  interconnection  system  of  Fig.  1,  and  achieve  short  propagation  lengths  by 
restricting  each  fanout  pattern  to  a  local  neighborhood.  The  system  uses  an  array  of  NxN  sub-DOE's  at  the  input 
plane  to  connect  to  an  array  of  NyN  detectors  at  the  output  plane.  For  this  locally  connected  neural  network 
interconnection,  each  sub-DOE  stores  one  weighted  fanout  pattern  that  connects  to  MxM  nearest  neighbors  in  the 
output  plane.  The  beam  incident  on  each  sub-DOE  comes  from  a  modulator  or  an  emitter  (not  shown),  which 
represents  an  interconnection  input  node  (e.g.,  an  output  of  a  neuron  unit).  The  optics  of  the  interconnection  system 
provides  a  Fourier  transform  (in  magnitude)  from  each  sub-DOE  to  the  detector  array,  which  serves  as  a  set  of 
neuron  unit  inputs.  We  constrain  the  sub-DOE  spacing  to  be  equal  to  the  detector  spacing  to  allow  for  use  in 
multilayer  systems.  A  globally  connected  space- variant  system  can  be  realized  similarly  by  replacing  the  microlens 
array  with  a  bulk  lens  and  letting  each  input  node  connect  to  all  output  plane  detectors  [1,2].  The  minimum 
propagation  distance  and  system  volume  can  be  shown  to  be  approximately  proportional  to  M  aMN^M  for  the  local 
system,  and  N  and  for  the  global  system,  respectively.  While  the  local  system  can  be  used  in  a  smaller  volume, 
its  crosstalk  levels  can  be  high  due  to  reconstruction  noise  (e.g.,  diffraction  orders  outside  of  the  local  fanout 
neighborhood)  of  each  sub-DOE.  In  this  paper,  we  describe  a  crosstalk  reduction  method  enabled  by  varying  the 
mapping  from  reconstructed  spot  locations  to  detector  locations.  Several  novel  DOE  designs  for,  and  simulations  of, 
local  fixed-weight  neural  network  interconnections  are  evaluated  as  a  test  of  this  method. 

(2)  Interconnection  crosstalk.  The  ideal  reconstructed  intensity  pattern  of  a  DOE  is  given  by  its  power 
spectrum,  which  is  periodic  except  for  a  gradual  (nonmonotonic)  tapering  off  of  higher  diffraction  orders  due  to  the 
finite  size  of  the  DOE  phase  elements.  Such  a  reconstructed  intensity  pattern  will  consist  of  desired  signals, 
sidelobes  of  signals  (SS),  and  spurious  diffraction  orders  (SDO),  as  shown  in  Fig.  2(a).  For  the  optical  system  of 
Fig.  1 ,  the  reconstructed  intensity  pattern  from  each  sub-DOE  will  be  a  relatively  accurate  rendition  of  its  power 
spectrum  in  a  local  region  near  the  microlens  optical  axis,  but  will  degrade  due  to  aberrations  and  nonparaxial  effects 
farther  away  from  the  axis.  To  approximate  these  effects,  we  model  the  local-region  reconstruction  as  the  exact 
power  spectrum  of  the  sub-DOE  as  shown  in  Fig.  2(a),  and  the  reconstruction  everywhere  outside  this  local  region 
as  a  uniform-intensity  blur.  For  the  system  scale  sizes  of  interest,  we  assume  that  this  local  region  is  of  size  equal  to 
two  DOE  reconstruction  periods  in  each  dimension. 

We  define  the  crosstalk,  p,  as  [noise  power]  /  [signal  power]  from  all  the  sub-DOE’s  with  all  input  nodes  fully  on 
at  a  given  output-plane  detector.  By  this  definition,  is  not  a  worst-case  measure,  but  is  meant  to  be  more  indicative 
of  the  average  crosstalk  level.  In  the  optical  system,  two  major  components  of  the  crosstalk  are  )3ii,  due  to  signal 
sidelobes,  and  )8]2,  due  to  spurious  diffraction  orders  (Fig.  2(a)).  Other  components  of  the  crosstalk  arise  due  to  the 
local-region  tails  of  all  reconstructed  spots  (crosstalk  component  /J2).  and  due  to  the  uniform  intensity  in  each 
blurred  region  (crosstalk  component  P^).  Taken  together,  these  crosstalk  components  can  be  reduced  at  the  cost  of 
increased  system  size,  either  by  using  a  larger  sub-DOE  area  to  force  the  reconstructed  spot  size  be  much  smaller 
than  the  detector  size,  or  by  increasing  the  detector  spacing.  On  the  other  hand,  Pi  =  Pi  \  +  Pn  is  independent  of 
system  size  and  deteetor  spacing  for  a  given  DOE  design.  Theoretically,  the  average  Pi  (over  all  detectors  in  the 
array)  will  be  approximately  I/771  -  1,  in  which  771  is  the  average  diffraction  efficiency  over  all  sub-DOE's.  Since 
P  ~  Pi  +  P2  +  P'i,  increasing  the  system  size  can  at  best  reduce  the  overall  crosstalk  to  Pi. 

To  evaluate  these  crosstalk  terms,  a  weighted  interconnection  with  an  array  of  128x128  nodes  in  both  the  input 
and  output  planes  has  been  simulated.  Nine  different  sub-DOE's  were  designed  using  the  Gerchberg-Saxton 
algorithm  [3],  each  of  which  connects  an  input  node  to  3x3  nearest  neighbors  in  the  output  plane  with  randomly 
chosen  connection  weights  between  zero  and  one.  Each  of  the  required  16,384  sub-DOE's  were  randomly  selected 
from  this  set  of  nine  sub-DOE  designs.  Figure  3  shows  the  resulting  crosstalk  for  16-phase-level  sub-DOE’s 
designed  with  8x8  phase  elements  in  one  period.  As  shown,  the  overall  crosstalk  can  be  reduced  by  decreasing  the 
spot  size  or  marginally  increasing  the  detector  spacing  (thereby  reducing  Pi  +  P^^  predicted  above). 
Unfortunately,  Pi  cannot  be  reduced  by  changing  these  parameters,  and  given  our  definition  of  crosstalk,  the 
resulting  value  of  Pi  =  0.164  is  likely  too  high  for  many  neural  systems. 
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(3)  Crosstalk  reduction  by  special  DOE  design.  We  now  consider  employing  an  alternative  DOE  design  to 
reduce  Pi,  Our  approach  is  rearrange  the  reconstruction  so  that  some  of  the  signal  sidelobes  and  spurious  diffraction 
orders  fall  in  6)j5^-detector  locations  in  the  output  plane.  We  accomplish  this  by  inserting  Y-l  spurious  diffraction 
orders  between  every  pair  of  signal  orders  in  each  dimension  when  designing  the  sub-DOE’s.  Then,  for  the  crosstalk 
reduction  parameter  T  >  1,  only  1  out  of  each  set  of  spurious  diffraction  orders  will  fall  on  the  detectors  (Fig.  2(b) 
and  (c)),  and  the  DOE  design  process  can  be  used  to  suppress  the  detected  spurious  diffraction  orders  at  the  expense 
of  nondetected  spurious  diffraction  orders.  The  Gerchberg-Saxton  algorithm  was  modified  to  incorporate  this 
capability,  and  additional  sets  of  simulations  (details  described  above)  were  performed  for  values  of  F  >  1.  [In  order 
to  keep  the  system  volume  constant  and  satisfy  the  above-mentioned  cascadability  constraint  on  the  input  node  and 
output  node  spacings,  the  oversampling  ratio  B  (defined  as  the  number  of  phase  elements  per  DOE  period  divided  by 
the  fanout  in  each  dimension),  was  increased  in  the  same  proportion  as  F.] 

For  F  =  2,  spurious  diffraction  order  crosstalk  (^12)  should  be  reduced  substantially  (depending  on  the  degree  to 
which  the  DOE  design  algorithm  can  suppress  the  detected  spurious  diffraction  orders).  However,  it  can  be  shown 
that  signal  sidelobe  crosstalk  (Pw)  will  not  be  reduced  if  the  effective  DOE  oversampling  ratio,  5/F,  is  constant. 
This  behavior  is  verified  by  our  simulations,  as  shown  in  Fig.  4. 

To  further  lower  Pi,  signal  sidelobe  crosstalk  (Pi  1)  can  be  reduced  by  moving  some  or  all  of  the  signal  sidelobes 
to  off-detector  locations.  This  can  be  achieved  by  setting  F  =  3,  thus  inserting  two  spurious  diffraction  orders 
between  each  pair  of  signal  orders  (Fig.  2(c)).  In  our  case,  signal  sidelobe  crosstalk  (Pu)  should  theoretically  be 
eliminated.  Figure  4  verifies  this  prediction  and  shows  that  the  design  algorithm  also  achieved  a  further  reduction  in 
Pi2.  The  crosstalk  term  Pi  is  seen  to  be  reduced  by  more  than  an  order  of  magnitude  in  going  from  F  =  1  to  F  =3. 

Figure  5  shows  the  total  simulated  crosstalk  for  the  case  of  F  =  3.  These  results  show  that  Pi  has  been  reduced  to 
the  point  where  it  is  no  longer  the  dominant  component  of  the  total  crosstalk.  The  other  crosstalk  components, 
p2  +  /J3,  have  increased  somewhat  (compared  with  the  F  =  1  case  of  Fig.  3),  because  of  a  lower  average  sub-DOE 
diffraction  efficiency.  Even  so,  the  total  crosstalk  is  significantly  reduced  for  most  parameter  values  of  interest. 

(4)  Discussion.  The  idea  of  inserting  spurious  diffraction  orders  in  between  signal  orders  can  be  usefully 
extended  to  any  prime  integer  F  >  3.  For  a  local-region  reconstruction  area  of  LxL  DOE  reconstruction  periods,  in 
theory  Pu  will  be  approximately  reduced  by  a  factor  of  F^  for  F  <  L/2,  and  will  be  zero  for  F  >  L/2  (provided  the 
effective  DOE  oversampling  ratio  B/Y  is  held  constant).  Physically  larger  optical  systems  will  generally  have  a 
larger  local-region  reconstruction  area,  and  should  therefore  benefit  from  values  of  F  larger  than  3.  On  the  other 
hand,  the  reduction  of  P12  will  depend  on  how  effectively  the  DOE  design  algorithms  suppress  the  detected  spurious 
diffraction  orders.  Our  design  programs  tended  to  reduce  P12  as  F  increased  (Fig.  4),  showing  additional  preference 
for  larger  F.  However,  the  spacing  between  adjacent  reconstructed  spots  gets  smaller  as  F  becomes  larger.  At  some 
value  of  F  the  reconstructed  spots  will  begin  to  overlap,  and  crosstalk  performance  will  degrade;  this  phenomenon 
was  verified  in  our  F  =  3  simulations  (e.g.,  y-axis  intercept  of  top  curve  in  Fig.  5).  This  will  constrain  the  maximum 
allowable  value  of  F  for  a  given  set  of  physical  dimensions.  For  neuron  unit  array  devices  that  have  smarter  pixels, 
more  device  area  for  electronics  is  required  so  that  larger  values  of  F  can  be  accommodated. 

There  are  two  other  advantages  for  using  F  >  1.  The  first  is  the  potential  to  reduce  the  propagation  distance  (and 
hence  the  system  volume).  It  can  be  shown  that  the  propagation  distance  is  proportional  to  the  effective  DOE 
oversampling  ratio  (B/Y)  for  a  given  set  of  independent  system  parameters  (such  as  N,  M,  wavelength,  detector  size, 
detector  spacing,  electronics  size,  and  DOE  minimum  feature  size).  Therefore,  for  a  given  DOE  grating  period 
(constant  B),  increasing  F  will  reduce  the  propagation  distance  by  a  factor  of  F;  other  simulations  we  have  performed 
show  that  it  will  also  reduce  the  crosstalk  component  Pi.  Secondly,  our  design  program  showed  the  ability  to  trade 
off  -'10%  to  20%  of  the  DOE  diffraction  efficiency,  7],  for  reductions  in  P12.  This  provides  an  additional  degree  of 
freedom  for  system  design.  The  results  shown  above  correspond  to  DOE  designs  that  favored  low  P12  over  high  rj. 

(5)  Conclusion.  A  crosstalk  reduction  method  with  the  potential  to  reduce  system  size  was  described  and  a  DOE 
design  algorithm  that  incorporates  this  method  was  developed.  Its  validity  was  verified  by  simulating  a  128x128 
fixed-weight  neural  network  interconnection  layer.  Under  our  modeling  assumptions,  this  method  reduces  crosstalk 
for  physically  small  systems  (results  shown)  as  well  as  for  larger  systems.  Similar  analyses  should  be  applicable  to 
digital  local  space-variant  parallel  systems  and  digital  or  analog  space-invariant  systems. 
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Fig  2-  (a)  Typical  reconstructed  power  spectrum  of  a  DOE  (with  period  8)  consists  of  the  desired  signal,  signal  sibelobes  (SS),  and 
spurious  diffraction  orders  (SDO).  (b)  The  reconstruction  for  /=  2.  (c)  The  reconstruction  for  /=  3.  Diffraction  orders  of  signal,  and 
the  first  signal  sidelobe  are  numbered.  DOE  periods  have  been  increased  to  16  ( V=  2),  and  22  (y=  3)  to  hold  constant  the 
propagation  distance  and  system  volume. 


Fig.  4:  ;8l ,  1 ,  and  /J-i  2  fof  1  ■  2, 3.  The  effective  Fig .  5:  Total  crosstalk  (/3 )  ( V  =  3)  for  a  1 28x1 28  locally  connected 

oversampling  ratio  (B/Y)  is  kept  constant  for  different  Y.  space-variant  neural  network  with  3x3  weighted  connections. 
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The  Optical  Transpose  Interconnection  System 

The  optical  transpose  interconnection  system^  (OTIS)  is  a  simple  means  of  providing  a  transpose 
interconnection  using  only  a  pair  of  lenslet  arrays.  This  system  has  been  shown  useful  for  shuffle  based  multi-stage 
interconnection  networks,  mesh-of-trees  matrix  processors,  and  hypercube  interconnections.  The  transpose 
interconnection  is  a  one-to-one  interconnection  between  L  transmitters  and  L  receivers,  where  L  is  the  product  of  two 
integers,  M  and  N,  To  implement  the  interconnection  a  x  4n  array  of  lenslets  is  placed  in  front  of  the  input 
plane,  and  a  Vm  x  Vm  array  of  lenslets  is  located  before  the  output  plane.  The  MxN  transpose  is  equivalent  to  a 
A;-shuffle2,  where  k  equals  N,  For  example,  a  4096  channel  {M  =  N  =  64)  interconnection  can  be  implemented  with 
two  8x8  lenslet  arrays.  Figure  1  shows  the  side  view,  and  actual  input  and  output  for  such  a  system.  An 
interesting  application  occurs  when  k  =  Vl;  in  this  case  only  one  stage  of  optics  and  two  stages  of  optoelectronic 
switches  are  required  for  full  routing  between  the  input  and  output  channels.  If  M  =  TV ,  then  both  lenslet  planes  are 
identical;  and  with  minor  modifications,  such  as  opaque  areas  on  the  lenses  to  prevent  cross-talk,  the  system  can  be 
made  bi-directional. 

Computer  Simulation  and  Optimization 

We  have  modeled,  using  Code  V®  software^,  256  channel  (A/  =  TV  =16)  OTIS  systems  with  arrays 
consisting  of  plano-convex  refractive  lenslets  as  well  as  spherical  and  aspheric  diffractive  lenslets.  Optimization 
goals  are  to  maximize  throughput  and  minimize  spot  size  on  the  output  plane.  First  order  geometrical 
approximations  determined  the  initial  design  of  each  system  given  fixed  parameters  such  as  448  |im  source  spacing, 
f/4  optics,  and  unit  system  magnification.  We  optimized  the  system  for  minimum  wavefront  variance  along 
representative  interconnect  paths  (straight  through,  diagonal,  etc.  see  Figure  1).  Initial  results  are  as  follows: 

100%  Encircled  Energy  Diameter  (|im)  /  Wavefront  Error  (Strehl) 


Field  Refractive  Spherical  Diffractive 

On-axis  57.04  /  0.936  55.11  /  1.000 

Intermediate  124.86  /  0.524  22.05  /  <  0.5 

Maximum  Field  249.68  /  <  0.5  83.34  /  <  0.5 


Surprisingly,  the  aspheric  terms  did  not  have  the  desired  effect  of  improving  off-axis  performance;  this  is 
most  likely  due  to  the  wide  range  of  interconnect  path  which  must  be  supported  by  a  single  lens  function.  We  are 
currently  modeling,  using  non-sequential  surfaces,  systems  in  which  each  lenslet  in  the  array  is  independently 
optimized  for  the  interconnections  it  is  required  to  support.  Preliminary  results  show  that  this  approach,  along  with 
modifications  to  the  merit  function,  will  significantly  improve  system  performance.  For  large  scale  systems,  this 
optimization  may  grow  to  be  unmanageable.  Fortunately,  the  symmetry  of  OTIS  allows  us  to  limit  the  number  of 
lenslets  which  need  to  be  independently  optimized.  As  shown  in  figure  I,  both  top  lenslets  perform  the  same 

W2  '^A 

function.  Furthermore,  the  symmetry  in  OTIS  limits  the  system  to  Xi  unique  lens  functions,  fora 

t=i  y=i  /=i 

symmetric  transpose  ( Af  =  TV).  For  example,  a  256  channel  symmetric  transpose  system  ( Af  =  TV  =  16)  has  only 
three  unique  lens  functions,  and  a  4096  channel  system  {M-N-  64)  has  ten. 

Photorefractive  Beam  Splitter 

Interconnection  systems  with  light  modulators  as  transmitters  require  a  beamsplitter  or  equivalent 
component.  This  element  is  necessary  to  direct  illumination  light  to  the  modulators  and  provide  low  losses  on  the 
interconnect  path.  The  traditional  component  used  is  a  Polarizing  Beam  Splitter  (PBS)  in  combination  with  a 
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quarter-wave  retardation  plate;  however  PBS’s  have  major  drawbacks  in  OTIS.  PBS’s  have  a  limited  angular 
acceptance  range,  typically  ±5°;  exceeding  this  range  results  in  polarization  ‘crosstalk’  and  lessened  overall  efficiency. 
Staying  within  this  range  leads  to  high  optics,  given  by: 


1  (  ^fM  ^fN 


^  V2l  VM  +  i 


^[N  +  l 


tan  5® 


For  example,  a  256  channel  system  (M  =  N  =  16)  would  be  limited  to  f/12.9  or  greater  optics,  and  a  4096  channel 
system  (M  =  N  =  6i)  would  be  limited  to  f/14.1.  Note  that  the  total  system  length  is  proportional  to  the  /^and 
low  /  lenses,  both  refractive  and  diffractive,  are  available.  Thus,  a  PBS  unnecessarily  increases  the  system  length. 

*  We  replace  the  PBS  with  a  volume  hologram  recorded  in  Iron  doped  Lithium  Niobate  (FetLiNbOs);  see 
Figure  2.  Such  an  element  utilizes  the  Bragg  selectivity  of  a  volume  grating  rather  that  the  polarization  selectivity 
of  a  PBS  to  distinguish  between  the  illumination  and  interconnect  paths.  Incident  plane  wave  illumination  may  be 
diffracted  towards  the  modulators  with  good  efficiency,  since  only  one  hologram  is  recorded;  while  the 
interconnection  paths  (composed  of  off-axis  convergent  and  divergent  beams  not  meeting  the  Bragg  condition)  suffer 
only  minimal  losses  due  to  surface  reflections  and  absorption. 

Theoretical  analysis  is  promising.  Analysis  based  on  coupled  mode  equations  predicts  peak  efficiency 
(theoretically  100%,  but  practically  we  can  expect  -60%  for  a  single  volume  transmission  grating  in  FerLiNbOs) 
achievable  over  a  wide  range  of  incident  angles,  given  the  proper  exposure;  and  a  Bragg  selectivity  (angular  deviation 
away  ft^om  Biagg  condition  at  which  the  diffraction  efficiency  has  fallen  to  1  /  of  maximum)  of  better  than  6  arc- 
minutes.  Experiments  to  verify  these  performance  predictions  are  in  progress;  results  will  be  presented. 

If  the  modulators  are  illuminated  normally  there  will  be  light  losses  in  the  system  due  to  vignetting 
(clipping  of  the  light  cones  reflected  from  the  modulators)  since  the  OTIS  lenslets  do  not  extend  to  the  edge  of  the 
chip  (the  losses  amount  to  89%  for  a  comer  modulator).  In  order  to  achieve  good  light  coupling  into  the 
interconnect  lenslets,  the  modulators  require  directed  illumination.  This  can  be  achieved  by  using  off-axis 
(decentered)  area-multiplexed  diffractive  lenslet  arrays.  Figure  3  shows  an  ‘unfolded’  illumination  system  for  a  256 
channel  system  (Af  =  A^  =  16),  and  a  detail  of  the  overlap  of  the  illumination  lenslets.  Both  illumination  and 
interconnection  optics  can  be  combined  in  the  same  element  by  using  Birefringent  Computer  Generated  Holograms'* 
(BCGH);  see  Figure  2.  A  BCGH  has  two  different  impulse  responses  for  the  two  orthogonal  states  of  polarization. 
Therefore,  it  can  be  used  to  implement  both  the  area-multiplexed  illumination  lenslets  and  the  OTIS  lenslets. 
Analytic  System  Modeling 

As  part  of  the  ongoing  effort  at  UCSD  in  device  and  system  modeling,  we  have  completely  modeled  an 
optoelectronic  interconnection  network.  Analytic  models  for  the  switches^,  transmitters,  detectors,  and  associated 
electronics*  have  been  derived  elsewhere.  Wavelength,  laser  noise,  number  of  phase  levels  and  minimum  feature  size 
of  the  diffractive  lenslets,  alignment  and  fabrication  errors,  surface  reflections,  absorption,  and  scattering  are  the 
parameters  included  in  the  modeling  of  the  optical  system.  Bandwidth  and  bit  error  rate  are  performance  metrics; 
Total  power  consumption,  power  dissipation  per  unit  area  on  chip,  area,  and  volume  determine  system  cost.  As  sn 
example.  Figure  4  shows  the  total  power  consumption  as  a  function  of  network  size.  Complete  results  of  this 


modeling  will  be  presented. 
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Figure  1:  Optical  Transpose  Interconnection  System 
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Figure  2:  Illumination  /  Interconnection  Optics  &  Photorefractive  Beam  Splitter 
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A  fiber  image  guide,  whether  being  a  coherent  fiber  bundle  or  a  single  gradient- 
index  fiber,  has  been  known  useful  to  transmit  image  signals.  Such  a  fiber  image  guide 
has  been  successfully  used  in  various  medical  endoscopic  and  industrial  inspection 
applications  [1,2].  High  resolution  analog  images  can  be  obtained  for  transmission 
distances  ranging  several  meters  or  longer.  Depending  on  used  fiber  materials,  relative  low 
loss  transmission  can  be  achieved  at  certain  transmission  wavelengths. 

The  modem  information  oriented  sciences  and  technology  are  mainly  driven  by  the 
rapid  advances  in  computer  technology.  One  visible  trend  in  computer  hardware  technology 
is  that  the  central  processing  units  or  CPU's  will  process  data  in  larger  and  larger  parallel 
formats,  from  8-bits  in  early  1980's,  to  16-bits  in  mid  1980's,  and  to  32-bit,  64-bits  or 
more  in  1990's.  In  order  not  to  suffer  unnecessary  delays,  technology  for  parallel 
communication  channels  between  such  CPU's  and  memory  or  input/output  (FO)  devices 
must  also  be  rapidly  developed.  Unfortunately,  due  to  inherent  bandwidth  limits  and 
electronics  interferences,  large  bandwidth  parallel  electronic  communication  channels  are 
very  difficult  to  be  established,  especially  for  cases  where  communication  length  are  long, 
say  longer  than  a  few  centimeters  [3].  One  solution  to  such  a  problem  suggests  to  use  a 
borrowed  optical  fiber-telecommunication  technology  where  a  large  amount  of  parallel 
information  is  transmitted  in  a  time-multiplexed  serial  format.  One  drawback  of  this  kind  of 
schemes  is  that  as  bit-rate  in  each  parallel  channel  increases,  electronic  hardware  for 
multiplexing  and  demultiplexing  will  also  experience  an  increasing  burden.  For  example, 
for  a  moderately  high  bit-rate  of  500  MHz/bit-channel,  a  32-bit  communication  will  have  to 
use  a  pair  of  16  GHz  multiplexer/demultiplexer,  making  the  hardware  very  difficult  to  be 
developed  in  terms  of  cost-effectiveness. 

The  present  research  was  motivated  by  the  fact  that  the  technology  behind  fiber 
image  guides  is  readily  available  and  relatively  mature,  and  it  can  fit  suitably  to  the 
computer  oriented  parallel  digital  interconnection  applications.  Since  the  interconnection 
distance  is  typically  from  a  few  centimeters  to  a  few  meters,  the  absorption  loss  as  well  as 
long-distance  cross-talks  between  imaging  pixels  of  such  a  fiber  image  guide  will  not  be  a 
troublesome  concern. 

In  Fig.l,  a  basic  bit-parallel  one-way  optical  data  transmission  system  for  sending 
parallel  messages  between  two  digital  chips  or  boards  is  depicted.  The  system  comprises 
the  following  components:  an  input  electronic  board  by  which  the  bit-parallel  digital 
electronic  data  are  to  be  transmitted,  a  laser  array  chip  which  converts  the  bit-parallel 
electronic  data  to  its  corresponding  bit-parallel  optical  format,  an  input  lens  which  serves  as 
an  objective  lens  imaging  the  emitted  optical  bit-parallel  signal  pattern  onto  the  surface  of 
the  fiber  image  guide,  a  fiber  image  guide,  an  output  lens  which  magnifies  and  images  the 
transmitted  optical  data  pattern  to  an  output  plane,  an  output  optical  detector  array  chip 
which  converts  the  optical  data  pattern  into  its  electronic  format,  and  an  electronic  receiver 
board  to  which  the  original  bit-parallel  data  is  intended.  The  system  also  comprises  the 


OThB5-2  /  287 


opto-mechanical  mountings  and  connectors  as  depicted  in  the  diagram.  The  image  guide 
can  be  any  one  of  at  least  the  following  four  types:  the  flexible  fiber  bundle  type,  the  rigid 
fiber  bundle  type  which  can  be  bent  only  when  being  heated  to  certain  temperature,  rigid 


2D  laser  connector  fiber  image  connector  2D  detector 
array  chip  housing  guide  housing  array  chip 


Fig.  1 :  A  one-way  fiber  image  guide  based  bit-parallel  optical  data  transmission  system. 

rigid  and  unbendable  graded-index  glass  type,  and  flexible  graded-index  plastic  or  polymer 
type.  The  input  imaging  lens  could  take  the  form  of  a  conventional  spherical  lens  or  it  could 

be  a  graded-index  type  planar  surface  lens,  such  as  a  SELFOC®  rod  lens.  The  individual 
lasers  in  the  laser  array  can  be  arranged  in  a  two  dimensional  rectangular  configuration.  At 
the  output  connector  side,  a  magnified  image  of  the  transmitted  data  pattern  will  be  formed 
at  the  detector  array  chip.  The  magnification  ratio,  however,  does  not  have  to  be  such  to 
exactly  compensate  the  demagnification  ratio  at  the  input  side  of  the  image  guide.  In 
practice,  to  minimize  the  amplification  of  noise,  the  spacing  between  two  adjacent  high¬ 
speed  individual  detectors  in  a  detector  array  will  have  to  be  kept  larger  than  the  spacing 
between  two  consecutive  lasers  transmitting  high  speed  data.  The  one-way  transmission 
system  of  Fig.  1  can  be  modified  to  accommodate  two-way  communications  of  bit-parallel 
data  using  space-division  or  wavelength-division  multiplexing  techniques; 

The  following  experiments  were  performed  to  confirm  the  proposed  principles.  To 
begin  with,  rigid  fiber  bundle  type  and  glass  gradient  index  type  image  guides  were  used. 
The  rigid  fiber  bundle  was  acquired  from  Edmund  Scientific.  It  has  an  overall  effective 
diameter  of  3.2  mm  and  a  length  of  300  mm.  The  individual  fiber  pixels  have  an  average 

diameter  of  12  |xm.  The  gradient  index  glass  image  guide  was  acquired  from  the  NSG. 

This  rigid  SELFOC®  rod  lens  has  a  280  mm  length  (a  4  pitch  rod)  and  1.3  mm  diameter. 
The  input  object  contains  64  pixels  in  an  8x8  array  format.  The  hole  diameter  and  pitch  are 

1mm  and  2  mm,  respectively.  Illuminated  by  a  HeNe  (X  =  632.8  nm)  laser,  the  object  was 
demagnified  at  an  ratio  of  8.5:1  before  its  image  arrives  at  the  surface  of  the  rigid  fiber 

bundle.  In  the  case  of  using  the  SELFOC®  rod  lens,  the  Fourier  transform  of  the  object  is 
formed  at  the  output  plane  where  the  rod  lens  surface  is  placed.  The  reason  to  use  the 
Fourier  transform  of  the  object  rather  than  the  demagnified  image  of  it  is  to  test  the  angular 

multiplexing  capability  of  the  SELFOC®  lens.  The  results  of  the  angle-  and  space- 
multiplexed  transmissions  are  shown  in  Fig.2(a)  and  (b),  respectively.  Fig.2(a)  is 
expected  to  contain  visible  side-lobe  patterns  due  to  a  filtering  effect  on  the  Fourier 
spectrum  by  the  limited  input  fiber  aperture.  We  have  also  tested  bit-parallel  optical 
transmissions  using  a  flexible  bundle  type  fiber  image  guide  of  1  m  length.  The  guide  has 
an  effective  diameter  of  0.5  mm  and  contains  6000  individual  fiber  pixels.  As  inputs,  an 
GaAs/AlGaAs  vertical  cavity  surface-emitting  laser  (VCSEL)  array  was  used.  16  individual 

lasers  (A,  =  980  nm)  in  a  4x4  array  format  with  a  laser  pixel  diameter  and  a  pitch  of  10  )im 
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(a) 


(b) 


Fig.2:  optical  angle-  and  space-multiplexed  8x8  bit-parallel  transmission  results. 

and  125  |xm,  respectively,  were  demagnified  at  a  ratio  of  3: 1  by  a  lens  before  their  images 
enter  the  image  guide.  A  magnifying  lens  is  used  to  deliver  the  output  to  a  CCD  camera  (see 
Fig.3(a)  and  (b)  for  the  captured  images  of  such  transmission).  In  addition,  measurements 
of  electric  cross-talks  between  adjacent  lasers  indicate  that  a  -30  dB  cross-talk  was 
maintained  for  a  laser  modulation  bandwidth  up  to  2  GHz.  Optical  cross-talks  occurred 
during  parallel  transmissions  were  our  primary  concerns.  Our  measurements,  however, 
confirm  that  such  nearest-neighbor  cross-talks  are  below  -30  dB  as  well.  Details  of  both 
measurements  will  be  shown  at  the  conference. 


(a) 


(b) 


Fig. 3:  Optical  4x4  bit-parallel  transmission  results  using  a  flexible  fiber  image  guide. 
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A  reconfigurable  optical  interconnection  which  implements  space- variant  connections  by  using 
a  phase  modulating  type  SLM  is  proposed.  System  configuration  and  experimental  results  are 
shown. 
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Reconfigurable  Space- Variant  Optical  Interconnection 

Using  Binary  CGH 

Takayuki  ISHIDA  and  Masatoshi  ISHIKAWA 

Department  of  Mathematical  Engineering  and  Information  Physics, 

Faculty  of  Engineering,  University  of  Tokyo 

Abstract:  A  reconUgurahle  optical  interconnection  which  implements  space-variant  connections  by  using  a  phase 
modulation  type  SLM  is  proposed.  The  proposed  method  allows  easier  alignment  compared  with  conventional 
crossbar  switch.  System  conhguration  and  experimental  results  are  shown. 


1  Introduction 

To  accomplish  reconfigurable  interconnects  be¬ 
tween  arbitrary  processors  in  parallel  processing  sys¬ 
tems,  space-variant  optical  interconnection  is  one  of 
the  most  effective  methods  because  light  beams  do 
not  interact  when  one  passes  through  the  other  in 
free-space.  An  optoelectronic  processing  architec¬ 
ture  using  space- invariant  optical  interconnection  has 
been  implemented  by  Kirk  et  aZ.[l],  however  few  op¬ 
toelectronic  systems  using  space- variant  interconnec¬ 
tion  have  been  realized.  Crossbar  switch  is  a  con¬ 
ventional  method  for  realizing  space- variant  optical 
interconnection[2],  but  it  lacks  scalability  in  terms 
of  light  intensity.  Holographic  interconnection  is  an¬ 
other  approach [3]  for  realizing  space- variant  optical 
interconnection  which  keeps  the  scalability  in  terms  of 
light  intensity.  However,  a  few  reconfigurable  space- 
variant  holographic  interconnects  have  been  realized, 
mainly  because  of  the  limits  and  restrictions  of  the 
devices  for  implementing  i*econfigurable  holograms. 

In  this  paper,  a  new  type  of  reconfigurable  space- 
variant  interconnection  using  binary  off-axis  CGH 
(Computer  Generated  Hologram)  is  proposed  from 
the  view  point  of  realizability.  The  system  design  is 
described  in  Section  2  and  experimental  results  are 
shown  in  Section  3. 

2  System  Design 

A  binary  phase  off-axis  CGH  is  used  for  realizing  a 
holographic  interconnection  because  an  off-axis  CGH 
is  less  sensitive  to  phase  modulation  errors  comparing 
with  an  on- axis  CGH [4]. 

2.1  System  Configuration 

The  system  configuration  is  shown  in  Fig.l.  The 
system  consists  of  an  input  plane,  an  MLA  (Micro 


Lens  Array),  a  phase  modulating  type  SLM,  a  lens 
and  an  output  plane.  The  input  plane  is  an  LD  array 
(Laser  Diode  array)  and  each  LD  is  connected  with  a 
PE  (Processing  Element)  individually.  Emitted  light 
from  the  LD  array  is  collimated  by  the  MLA  and  in¬ 
cident  on  the  SLM.  A  CGH  is  implemented  on  the 
SLM  and  the  lens  is  adjoined  to  it.  By  updating  the 
CGH  pattern  on  the  SLM,  the  interconnection  pat¬ 
tern  may  be  modified.  The  Fourier  transform  of  the 
CGH  is  obtained  on  the  output  plane  through  the 
lens.  On  the  output  plane,  each  zeroth-order  diffrac¬ 
tion  is  collected  at  a  point  which  is  located  nearby 
the  PD  array.  Each  PD  is  connected  with  a  PE  indi¬ 
vidually. 


MLA :  Micro  Lens  Array  PD :  Photo  Detector 

Fig.  1  System  configuration 

2.2  CGH 

Figure  2  shows  the  CGH  pattern.  A  binary  phase 
off-axis  CGH  is  used  because  a  phase  modulating  type 
CGH  has  higher  diffraction  efficiency  comparing  with 
intensity  modulating  type.  The  whole  CGH  pattern 
consists  of  CGH  unit-patterns,  and  a  unit-pattern 
consists  of  CGH  primary- patterns.  A  unit-pattern 
is  assigned  to  one  LD  spot  as  shown  in  Fig.2,  and 
consists  of  Q  X  Q  times  repeated  primary-patterns. 
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Fig.  2  CGH  pattern 

A  primary-pattern  consists  of  M  x  M  binary  pixels. 
An  off-axis  CGH  is  less  sensitive  to  phase  modulation 
errors  comparing  with  an  on-axis  CGH[4],  so  that  an 
off-axis  CGH  can  be  easily  implemented.  A  binary 
CGH  is  proposed  because  SLMs  which  can  perform 
multilevel  modulation  are  currently  more  difficult  to 
fabricate.  Generally  speaking,  an  off-axis  CGH  re¬ 
quires  a  large  diffraction  angle  for  the  zeroth- order 
diffraction  not  to  lie  in  the  PD  array  area.  However, 
this  system  allows  a  small  diffraction  angle  since  the 
zeroth-order  diffraction  is  collected  at  one  point. 

2.3  Comparison  with  Crossbar  Switch 

Optical  crossbar  switch  is  a  straightforward,  con¬ 
ventional  method  for  realizing  space- variant  intercon¬ 
nection  which  uses  optical  matrix-vector  products. 
The  proposed  method  has  several  advantages  com¬ 
paring  with  crossbar  switch. 

One  advantage  is  the  scalability  in  terms  of  light 
intensity.  In  crossbar  switch,  multiple  images  of  the 
LD  array  are  generated,  and  the  output  intensity  de¬ 
creases  as  the  number  of  the  PDs  increases.  On  the 
other  hand,  the  proposed  method  uses  holography,  so 
that  the  output  intensity  is  independent  of  the  num¬ 
ber  of  the  PDs. 

A  second  advantage  is  that  the  proposed  method 
provides  better  tolerances  for  lateral  shift  of  the  inci¬ 
dent  beam.  As  shown  in  Fig.3,  the  system  is  mis¬ 
aligned  only  when  the  incident  beam  from  an  LD 
is  outside  the  CGH  unit-pattern  (s  x  s  in  Fig.3(a)) 
because  the  optical  Fourier  transform  is  shift  invari¬ 
ant  in  terms  of  output  intensity.  In  Fig.3,  the  ra¬ 
tio  of  the  margin  for  alignment  to  the  CGH  area 
is  described  as  t.  Let  the  margin  be  called  ‘align¬ 
ment  tolerance,’  which  is  described  as  ts  in  Fig. 3(a). 
The  ‘alignment  tolerance’  can  be  compared  as  shown 


in  Table  1.  In  Case  1  in  Table  1,  the  same  size  of 
one  pixel  (=  a  x  a)  is  assumed,  and  in  Case  2,  the 
same  area  of  the  SLM  L  x  L)  is  assumed.  To 
calculate  the  numbers  the  following  parameters  are 
used:  ^  =  0.1,M  =  64,Q  =  2.5,  P=16,  a  =  10/im,  and 
L=25.6mm.  It  can  be  seen  in  Table  1  the  proposed 
method  provides  better  tolerances  for  lateral  shift  of 
the  incident  beams  in  both  cases. 


LD  spot 


. j 


misaligned 


Fig.  3  Correct  alignment  and  misalignment 


Table  1  Alignment  tolerance 


Crossbar  switch 

Proposed  system 

Case  1 

II 

tQMa 
(=  160/im) 

Case  2 

tLIF^ 

(=  10/xm) 

tL/P 

(=  160/xm) 

A  third  advantage  is  the  number  of  required  align¬ 
ment  spots.  In  order  to  achieve  an  interconnection 
between  LDs  and  PDs  by  crossbar  switch,  P^ 
alignment  spots  are  required  on  the  SLM,  whereas  the 
proposed  method  requires  P^  alignment  spots  only. 
Therefore  the  proposed  system  is  easier  to  realize. 

3  Experimental  Results 

The  experimental  system  is  shown  in  Fig.4.  A  col¬ 
limated  beam  from  632. 8nm  He-Ne  laser  through  a 
mask  pattern  was  used  as  a  source  array,  because  an 
LD  array  which  was  suitable  for  the  setup  was  not 
currently  available.  A  binary  CGH  pattern  was  dis¬ 
played  on  an  LCD  (Liquid  Crystal  Display).  The 
LCD  has  640  x  400  pixels,  each  of  which  is  300/xm  in 
size.  The  LCD  was  illuminated  with  incoherent  white 
light  and  imaged  onto  a  PAL-SLM  (Parallel  Aligned 
nematic  Liquid  crystal  SLM)  developed  by  Hama¬ 
matsu  Photonics  K.K.  A  reduction  optics  is  used  to 
make  each  of  the  CGH  pixels  on  the  LCD  to  be  15^m 
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Fig.  4  Experimented  setup 

in  size  on  the  PAL-SLM.  An  intensity  pattern  at  the 
write  side  of  the  PAL-SLM  is  transferred  to  a  phase 
modulation  of  the  read  beam.  Light  reflected  from 
the  read  side  of  the  PAL-SLM  was  brought  to  a  fo¬ 
cus  with  a  300mm  lens.  The  maximum  diffraction 
angle  was  0.6°,  and  the  assumed  PD  array  pitch  was 
197.75/xm. 
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Fig.  5  Designed  CGHs 
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Fig.  6  Experimental  results 


Figure  5  shows  the  designed  CGHs.  Six  alpha¬ 
bets  in  Fig. 5  denote  six  different  light  sources  and 
Fig.6(a)  shows  the  ideal  output  pattern,  of  which 
the  alphabets  correspond  to  the  alphabets  of  the 
light  sources  in  Fig.5.  Although  a  16  x  16  PE  ar¬ 


ray  was  assumed,  3x2  light  sources  were  imple¬ 
mented  due  to  the  limited  number  of  the  pixels  of 
the  LCD.  The  CGH  primary-patterns  were  designed 
by  simulated  annealing[5]  and  each  primary- pattern 
has  64  X  64  pixels.  Each  CGH  unit-pattern  consists  of 
2.5  X  2.5  primary-patterns  horizontally  and  vertically 
(i.e.  Q  =  2.5,)  but  the  effective  area  which  the  in¬ 
cident  light  illuminates  was  equivalent  to  Q=2.  The 
condition  Q=2  was  supported  by  theoretical  consid¬ 
erations  and  experimental  results.  The  effective  areas 
of  the  CGHs  are  indicated  by  white  circles  in  Fig.5 
and  the  obtained  results  are  shown  in  Fig.6(b). 

4  Conclusion 

A  new  type  of  holographic  interconnection  for  re¬ 
alizing  reconfigurable  space-variant  optical  intercon¬ 
nection  is  presented.  A  binary  off-axis  CGH  is  pro¬ 
posed  from  the  view  point  of  realizability.  The  advan¬ 
tages  of  the  system  are  the  ease  of  alignment  and  the 
high  realizability.  Experimental  results  using  PAL- 
SLM  are  shown. 

The  authors  would  like  to  thank  Hamamatsu  Pho¬ 
tonics  K.K.  for  their  assistance  with  the  PAL-SLM. 
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The  tremendous  progress  in  high  performance 
Very-Large  Scale  Integrated  circuit  (VLSI)  technology 
has  made  possible  the  incorporation  of  several  million 
transistors  onto  a  single  silicon  chip  with  on-chip  clock 
rates  of  200  MegaHertz  (MHz).  By  the  end  of  decade, 
the  integration  density  for  silicon  Complementary  Metal 
Oxide  Semiconductor  (CMOS)  is  expected  to  be  over 
20  million  transistors  and  the  projected  on-chip  clock 
rate  is  500  MHz.  This  enormous  bandwidth  that  will  be 
available  for  computation  and  switching  on  a  silicon 
integrated  circuit  will  create  a  huge  bottleneck  for  Input 
and  Output  (I/O)  to  the  VLSI  circuit.  Technologies  that 
are  being  developed  at  AT&T  Bell  Laboratories,  now 
exist  for  attaching  GaAs  Multiple  Quantum  Well 
(MQW)  photodetectors  and  light-modulators  onto  a  pre¬ 
fabricated  silicon  integrated  circuit  using  a  well- 
established  hybrid  flip-chip  bonding  technique  followed 
by  substrate  removal  of  the  GaAs  chip  to  allow  surface- 
normal  operation  of  the  optical  modulators  at  850nm 
[1].  From  a  systems  point  of  view,  the  demands  made 
of  optoelectronic  integration  method  are  (i)  that  the 
silicon  integrated  circuit  be  state-of-the-art,  (ii)  the 
circuit  be  unaffected  by  the  integration  process,  (iii)  that 
the  design  and  optimization  of  the  circuit  proceed 
independently  of  the  placement  and  bonding  to  the 
optical  I/O.  The  first  two  goals  have  been  achieved  in 
reference  1,  and  this  technique  has  been  effectively 
applied  to  sin:q)le  switching  nodes  for  a  smart-pixel 
based  photonic  switch  in  reference  2.  In  this  paper  we 
further  achieve  the  third  goal  by  demonstrating  for  the 
first  time  that  modulators  can  be  bonded  directly  above 
active  submicron  CMOS  transistors  (figure  1),  and  by 
applying  the  technique  to  the  demonstration  of  a  high- 
density  2Kbit  first-in  first-out  (Fifo)  page  buffer  circuit. 

The  final  structure  of  the  optoelectronic  circuit  is 
shown  in  figure  1,  which  shows  the  GaAs/AlGaAs 
MQW  modulator  bonded  to  the  silicon  circuit  possibly 
directly  over  a  transistor  gate.  To  demonstrate  the 
feasibility  of  this  concept,  we  have  fabricated  a  low- 
area  receiver-transmitter  circuit  that  consist  of  a 
transimpedance  receiver  circuit  with  one  gain  stage 
(figure  2a).  The  receiver  consists  of  an  input  stage 
connected  to  the  MQW  photodetector  and  biased.  The 
transimpedance  feedback  to  this  stage  is  accomplished 
using  a  parallel  combination  of  a  diode-connected  nmos 
device  (gate  attached  to  drain)  with  a  saturated  pmos 


device  to  enable  a  high  dynamic  range.  No  equalization 
stages  are  used.  A  single  gain  stage  is  used  restore  the 
detected  signal  to  logic  levels.  The  transmitter  circuit 
consisted  of  an  inverter  with  its  output  connected  to  the 
p  contact  of  the  MQW  device.  SPICE  simulations  of 
the  circuit  operating  at  375Mb/s  with  NRZ  bit  patterns 
are  shown  in  figure  2b.  Approximately  50mW  of  optical 
power  was  required  to  switch  the  circuit;  this 
measurement  agreed  with  DC  SPICE  simulations  of 
24|iA  switching  currents.  The  receiver-transmitter  pair 
was  operated  at  375Mb/s  bit  rates  with  measured 
switching  energies  of  approximately  370fJ.  The  area  of 
the  circuit  was  approximately  SOOiom^  including  all 
wiring,  in  a  0.8pm  CMOS  technology. 

We  then  used  the  technique  described  above  toward 
the  in^lementation  of  a  high-density  Fifo  memory 
circuit.  The  basic  Fifo  circuit  is  a  useful  tool  in 
designing  switching  networks  and  other  data-flow  and 
signal-processing  architectures  in  that  it  can  provide 
non-volatile  storage  (buffering),  asynchronous-to- 
synchronous  conversion,  and  bandwidth  conversion 
between  its  input  and  output  data  streams.  The  Fifo  is 
made  up  of  a  number  of  component  cells.  The 
fundamental  building  block  of  the  Fifo  is  the  bit  cell 
(figure  3).  The  cells  are  placed  side  by  side  in  an 
orderly  pattern  to  make  a  Fifo  of  any  size.  The  number 
of  columns  and  rows  in  the  Fifo  buffer  memory 
corresponds  to  the  number  of  pages  and  the  number  of 
bits  per  page  respectively.  This  cell  is  a  single  memory 
element  consisting  of  two  pass  gates  and  three  inverters. 
The  state  of  the  pass  gates  together  determine  whether 
the  bit  cell  is  in  the  "store"  or  the  "load"  mode. 

The  pass  gates  are  controlled  by  a  pair  of  select 
lines  that  run  vertically  across  all  input  bits  in  a  column. 
There  is  a  buffer  cell  consisting  of  two  large  inverters  in 
series  located  at  the  bottom  of  each  column.  The 
control  logic  for  the  Fifo  is  located  below  the  buffers 
(figure  3).  The  control  logic  consists  of  three  Nand 
gates.  The  first  two  Nand  gates  are  cross  coupled  as  a 
set-reset  latch.  The  output  of  the  latch  must  pass  through 
the  five  input  Nand  gate  before  it  effects  the  state  of  the 
Fifo  cell.  The  five  input  Nand  gate  determines  the  state 
of  the  pass  gates  in  the  Fifo  cells.  Each  column  can 
either  load  or  store  data. 

The  controller  for  a  single  column  in  the  middle  of 
the  Fifo  operates  as  follows.  If  the  column  is  empty  and 
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the  previous  column  is  empty  the  controller  remains  in  a 
stable  state  of  Empty  (font  =  zero)  and  Storing  data.  If 
the  column  is  Empty  and  the  previous  column  has  data 
and  is  in  the  Store  state  then  the  column  goes  into  the 
Load  state,  the  latch  sets  to  Not  Empty,  and  the  column 
transitions  back  to  the  Store  state.  If  the  column  contains 
data  and  is  Not  Empty,  then  it  remains  in  this  state  until 
either  a  reset  signal  is  received  or  the  next  column 
performs  a  shift-out.  If  either  occurs,  then  the  column  is 
designated  as  Empty  and  as  Storing  data. 

When  reset  is  low  it  forces  all  control  cells  to  a 
known  state.  The  latch  is  cleared  to  show  the  cell  is 
enq)ty.  The  reset  line  is  connected  to  one  of  the  inputs 
to  the  five  input  Nand.  Reset  forces  the  output  of  the 
Nand  gate  to  T’  which  puts  the  associated  column  of 
Fifo  data  in  the  store  state.  The  first  and  last  columns  of 
the  Fifo  page  buffer  are  slightly  different.  The  state  of 
the  first  and  last  columns  of  bit-cells  can  be  read  out 
through  "Iready"  and  "Oready"  control  lines, 
respectively.  There  are  also  two  more  electrical  control 
lines,  "Shift-in"  and  "Shift-out"  that  respectively  force 
data  to  enter  and  exit  the  Fifo.  Valid  data  may  be 
shifted  into  and  out  of  the  Fifo  only  when  the 
appropriate  Iready  or  Oready  control  lines  are  "high". 
Note  that  the  data  can  be  independently  be  shifted  into 
and  out  of  the  Fifo  (simultaneously  if  necessary). 

We  have  recently  fabricated  a  circuit  cell  that 
consists  of  a  2Kbit  array  of  64  First-In  First  Out  (Fifo) 
buffer  channels,  each  of  which  are  32  bits  deep.  32 
channels  were  connected  to  detectors,  receiver  circuits, 
modulators,  and  modulator  driver  circuits  for  optical 
testing  of  the  Fifo.  The  Fifo  was  implemented  in  a  3- 
level  metal  0.8pm  CMOS  process.  All  transistors  and 
routing  for  the  Fifo  was  performed  using  only  two  levels 
of  metal.  The  third  level  of  metal  over  the  Fifo  was 
used  solely  for  the  flip-chip  bonding  pads  and  for 
connections  to  these  pads  to  the  underlying  circuits.  The 
bonding  pads  for  the  photodetectors  and  the  modulators 
are  in  the  center  of  the  array,  directly  over  the  active 
circuits  of  the  Fifo.  Each  modulator  requires  two  pads: 
one  for  the  n-contact  and  the  other  for  the  p-contact. 
Figure  4  shows  the  array  of  the  bonded  modulators 
bonded  directly  over  the  active  circuits  of  the  Fifo. 
Note  that  the  design  method  described  here  allows  the 
photodetectors  and  the  modulators  to  be  placed  in 
arbitrary  and  potentially  different  grid  patterns, 
according  to  the  convenience  of  the  optical  system  that 
conveys  the  light  beams  to  and  from  the  chip.  This  was 
achieved  in  the  prototype  chip  by  using  a  large  array  of 
MQW  devices,  and  using  a  4x8  sub-array  of  these 
devices  for  the  photodetectors,  and  a  16x2  sub-array  of 
the  devices  for  the  modulators.  The  entire  photonic  Fifo 
circuit  incorporates  over  21 K  transistors  in  an 
850pmx950pm  area  (including  control  circuitry  and 
wiring),  corresponding  to  a  circuit  density  of 
approximately  40pm2/transistor.  This  represents  an 
order  of  magnitude  improvement  in  circuit  density  over 
previous  smart  pixel  circuits.  Based  on  this  integration 


density,  a  full-scale  system  prototype  could  provide  a 
200Kbit  photonic  page  buffer. 

After  bonding,  electrical  operation  of  the  chip  was 
performed  to  ensure  that  the  circuits  performed  as 
expected  after  the  bonding  process,  and  to  characterize 
the  maximum  speed  of  operation  of  the  loaded  circuits. 
Operation  of  the  Fifo  involves  the  shifting  of  bits 
through  all  32  shift  registers  of  the  Fifo.  Correct 
operation  was  achieved  using  a  custom-built  test  board. 
This  test  board  contained  an  EPROM  to  store  the  test 
program;  this  code  was  them  loaded  onto  an  FPGA. 
The  FPGA  then  controlled  the  chip  in  an  optical  bench, 
placing  all  the  required  electrical  signals  and  clock 
needed  to  test  the  operation  of  the  Fifo.  Outputs  are 
displayed  cm  LEDs  for  visual  inspection.  This 
confirmed  that  the  electrical  performance  of  the  Fifo 
buffer  was  unaffected  by  the  bonding  operation. 

The  test-bench  was  also  used  setup  for  optical 
testing  of  the  Fifo  data-buffer  circuit,  one  channel  at  a 
time.  Two  high-speed  laser  diodes  were  used  in  the 
setup:  one  for  input  to  the  Fifo  data  channels,  and  one 
for  reading  out  the  stored  data.  The  wavelength  of  the 
input  laser  was  approximately  85L5nm  and  that  of  the 
readout  laser  was  approximately  852nm.  A  reverse  bias 
of  lOV  was  applied  to  the  detectors  and  the  modulators. 
The  required  optical  power  for  a  logic  “1”  was  50pW 
per  channel.  32  bits  of  data  was  loaded  into  a  specific 
Fifo  chaimel  by  modulating  the  input  laser  diode  to 
provide  the  data  and  by  toggling  the  electrical  shift_in 
control  line.  The  data  was  then  transferred  to  the  output 
line  by  toggling  the  electrical  shift_out  control  line.  The 
ouqtut  beam  was  focused  onto  a  detector  and 
simultaneously  imaged  onto  a  CCD  for  visual 
inspection.  The  output  data  was  observed  as  an 
intensity  modulation  of  the  readout  laser  beam,  with  a 
contrast  ratio  of  2:1.  The  applied  voltage  swing  across 
the  modulators  was  approximately  5V.  Speed  tests  were 
also  performed  on  the  individual  Fifo  data  channels. 
The  data  transfer  rate  through  the  photonic  Fifo  was 
measured  at  approximately  50MHz  per  channel;  This 
transfer  rate  was  limited  by  the  electrical  board  and 
ribbon  connecting  the  electrical  shift_in  and  shift_out 
control  signals  to  the  chip.  SPICE  simulations  indicate 
that  a  transfer  rate  of  200MPages/s  per  can  be  achieved 
with  the  above  circuit. 

Speed  tests  of  the  flip-chip  process  were  also 
performed  by  measuring  oscillation  frequencies  of 
probed  ring  oscillator  circuits.  We  have  measured  bare 
(unity  fanout)  as  well  as  loaded  ring  oscillators  (i.e. 
each  inverter  in  the  ring  attached  to  pads  and 
modulators)  switching  frequencies  before  and  after 
bonding  to  examine  effects  of  capacitive  loading. 
Switching  frequencies  of  6.3GHz  (158ps  delay)  were 
measured  for  the  single  unity  fanout  inverter;  switching 
frequencies  of  2.57GHz  (389ps  delay)  were  measured 
for  the  inverter  loaded  with  pads,  wiring  to  modulators, 
and  deposited  barrier  metal,  and  the  speed  of  the  loaded 
inverter  after  bonding  (loaded  with  the  wire,  pad,  and 
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bonded  modulator)  was  measured  at  2.08Ghz. 
Simulations  indicate  that  the  additional  delay 
corresponds  to  a  capacitive  load  of  approximately  32- 
35fF.  Tliese  results  suggest  that  the  loading  of  the 
circuits  with  the  bond  and  any  additional  wire  for 
remote  wiring  will  not  be  the  limiting  factor  in  system 
performance,  and  that  the  high-density  CMOS/MQW 
flip-chip  bonded  smart  pixel  circuits  as  described  here 
may  be  used  for  high-performance  systems. 
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Figure  1:  Structure  of  the  hybrid  GaAs  MQW/silicon 
CMOS  circuit.  Modulators  may  be  bonded  directly  on 
top  of  active  gates. 


Detector 


/: 

3utp 

tut 

V-  H 

1 

7i 

/ 

-L 

f 

1 

1 

TI 

1, 

1 

!i^ 

[  ■” 

i 

n 

UJ 

rinpi 

uZL 

12 

Jt 

r\J 

i 

) 

V 

i 

L _ f. 

T 

1  1 

"I 

1 _ 

rf"* 

Lj 

_ 

1 

1 _ 1 

ri 

.2 

LJ— ■ 

. ’d 

On  e.OOn  12.00n  IS.OOn  24.00n  30.00n 


T 

Figure  2:  (a)  Transimpedance  receiver  circuit  diagram, 
(b)  SPICE  simulations  of  10110100  Bit  pattern  at 
375Mb/s.  The  circuit  was  experimentally  verified  at  this 
data  rate. 


Figure  3:  Fifo  circuit  schematic  showing  bit  cell,  bit 
rows  and  columns,  and  control  circuitry 


Figure  4.  Microphotograph  of  the  O.Spm  Fifo  circuit 
after  bonding  and  substrate  removal.  Pads  are 
15pmxl5|iim  with  15pm  spacing.  Modulators  are 
located  on  60pm  centers.  Modulators  are  bonded 
directly  on  top  of  the  active  circuits  of  the  Fifo. 
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